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NOVEL SURFACE EXPOSED PROTEINS FROM CHLAMYDIA PNEUMONIAE 



The present invention relates to the identification of 
members of a gene family from the human respiratory pathogen 
Chlamydia pneumoniae, encoding surface exposed membrane 
5 proteins of a size of approximately 89-101 kDa and of 56-57 
kDa, preferably about 89.6-100.3 kDa and about 56.1 kDa. The 
invention relates to the novel DNA sequences, the deduced 
amino acid sequences of the corresponding proteins and the 
use of the -DNA sequences and the proteins in diagnosis of 
10 infections caused by C. pneumoniae, in pathology, in 
epidemiology, and as vaccine components. 

GENERAL BACKGROUND 

C. pneumoniae is an obligate intracellular bacteria 
(Christiansen and Birkelund (1992); Grayston et al. (1986)). 

15 It has a cell wall structure as Gram negative bacteria with 
an outer membrane, a periplasmic space, and a cytoplasmic 
membrane. It is possible to purify the outer membrane from 
Gram negative bacteria with the detergent sarkosyl. This 
fraction is named the 'outer membrane complex (OMC) ' (Caldwell 

20 et al. (1981)). The COMC (Chlamydia outer membrane complex) 
of C. pneumoniae contains four groups of proteins: A high 
molecular weight protein 98 kDa as determined by SDS-PAGE, a 
double band of the cysteine rich outer membrane protein 2 
(Omp2) protein of 62/60 kDa, the major outer membrane protein 

25 (MOMP) of 38 kDa, and the low-molecular weight lipo -protein 
0mp3 of 12 kDa. The Omp2/Omp3 and MOMP proteins are present 
in COMC from all Chlamydia species, and these genes have been 
cloned from both C. trachomatis, C. psittaci and C. 
pneumoniae. However, the gene encoding 98 kDa protein from C. 

30 pneumoniae COMC have not been characterized or cloned. 

The current state of C. pneumoniae serology and detection 

C. pneumoniae is an obligate intra-cellular bacteria 
belonging to the genus Chlamydia which can be divided into 



SUBSTITUTE SHEET (RULE 26) 



WO 98/58953 PCI7DK98/00266 

2 

four species: C. trachomatis, C. pneumoniae, C. psi ttaci and 
C.pecorum. Common for the four species is their obligate 
intra cellular growth, and that they have a biphasic life 
cycle, with an extracellular infectious particle (the 
5 elementary body, EB) , and an intercellular replicating form 
(the reticulate body, RB) . In addition the Chlamydia species 
are characterized by a common lipopolysaccharide (LPS) 
epitope that is highly immunogenic in human infection. C. 
trachomatis is causing the human ocular infection (trachoma) 

10 and genital inf ections . C, psi ttaci is a variable group of 
animal pathogens where the avian strains can occasionally 
infect humans and give rise to a severe pneumonia 
(ornithosis) . The first C. pneumoniae isolate was obtained 
from an eye infection, but it was classified as a non-typable 

15 Chlamydia. Under an epidemic outbreak of pneumonia in Finland 
it was realized that the patients had a positive reaction in 
the Chlamydia genus specific test, (the lygranum test), and 
the patients showed a titre increase to the untyped Chlamydia 
isolates. Similar isolates were obtained in an outbreak of 

20 upper respiratory tract infections in Seattle, and the 
Chlamydia isolates were classified as a new species, 
Chlamydia pneumoniae (Grayston et al. (1989)). In addition, 
C. pneumoniae is suggested to be involved in the development 
of atherosclerotic lesions and for initiating bronchial 

25 asthma (Kuo et al. (1995)). These two conditions are thought 
to be caused by either chronic infections, by a 
hypersensitivity reaction, or both. 

Diagnosis of Chlamydia, pneumoniae infections 

Diagnosis of acute respiratory tract infection with C. 
30 pneumoniae is difficult. Cultivation of C. pneumoniae from 
patient samples is insensitive, even when proper tissue 
culture cells are selected for the isolation. A C. pneumoniae 
specific polymerase chain reaction (PCR) has been developed 
by Campbell et al. (1992) . 
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Even though Chlamydia pneumoniae has in several studies been 
detected by this PCR it is debated whether this method is 
suitable for detection under all clinical situations. The 
reason for this is, that the cells carrying Chlamydia 
5 pneumoniae in acute respiratory infections have not been 

determined, and that a chronic carrier state is expected but 
it is unknown in which organs and cells they are present. 
Furthermore, the PCR test is difficult to perform due to the 
low yield of these bacteria and due to the presence of 

10 inhibitory substances in the natient samples . There-fore , it 
will be of great value to develop sensitive and specific 
sero- diagnostics for detecting both acute and chronic 
infections- Sero-diagnosis of Chlamydia infections is 
currently based on either genus specific tests as the 

15 Lygranum test and ELISA, measuring the antibodies to LPS, or 
the more species specific tests where antibodies to purified 
EBs are measured by microimmuno fluorescence (Micro- IF) (Wang 
et al. (1970)). However, the micro-IF method is read by 
microscopy, and in order to ensure correct readings the 

20 result must be compared to the results with C. trachomatis 
used as antigen due to the cross -reacting antibodies to the 
common LPS epitope. Thus, there exists in the art an urgent 
need for development of reliable methods for species specific 
diagnosis of Chlamydia pneumoniae, as has been expressed in 

25 Kuo et al. (1995); »..a rapid reliable laboratory test of 

infection for the clinical laboratory is a major need in the 
field". Furthermore, the possible involvement of C. 
pneumoniae in atherosclerosis and bronchial asthma clearly 
warrants the development of an effective vaccine. 

30 DETAILED DISCLOSURE OF THE INVENTION 

The present invention aims at providing means for efficient 
diagnosis of infections with Chlamydia pneumoniae as well as 
the development of effective vaccines against infection with 
this microorganism. The invention thus relates to species 
35 specific diagnostic tests for infection in a mammal, such as 
a human, with Chlamydia pneumoniae, said tests being based on 
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the detection of antibodies against surface exposed membrane 
proteins of a size of approximately 89-101 kDa and of 56-57 
kDa, preferably of about 89.6-100.3 kDa and about 56.1 kDa 
(the range in size of the deduced amino acid sequences was 
5 from 100.3 to 89.6 except for 0mpl3 with the size of 56.1 
kDa) , or the detection of nucleic acid fragments encoding 
such proteins or variants or subsequences thereof. The 
invention further relates to the amino acid sequences of 
proteins according to the invention, to variants and 

10 subsequences thereof , and to nucleic acid fragments encoding 
these proteins or variants or subsequences thereof. The 
present invention further relates to antibodies against 
proteins according to the invention. The invention also 
relates to the use of nucleic acid fragments and proteins 

15 according to the invention in diagnosis of Chlamydia 
pneumoniae and vaccines against Chlamydia pneumoniae. 

Prior to the disclosure of the present invention only a very 
limited number of genes from C. pneumoniae had been 
sequenced. These were primarily the genes encoding known C. 

20 trachomatis homologues: MOMP, Omp2, Omp3, Kdo- transferase, 
the heat shock protein genes GroEl/Es and DnaK, a 
ribonuclease P homologue and a gene encoding a 76 kDa protein 
of unknown function. The reason why so few genes have been 
cloned to date is the very low yield of C. pneumoniae which 

25 can be obtained after purification from the host cells. After 
such purification the DNA must be purified from the EBs, and 
at this step the C. pneumoniae DNA can easily be contaminated 
with host cell DNA. In addition to these inherent 
difficulties, it is exceedingly difficult to cultivate C. 

30 pneumoniae and use DNA technology to produce expression 

libraries with very low amounts (few j*g) of DNA. It has been 
known since 1993 (Melgosa et al., 1993) that a 98 kDa protein 
is present in OMC from C. pneumoniae. Even though the protein 
bands of 98 kDa was mentioned to be part of the OMC of C. 

35 pneumoniae by Melgosa, the gene sequences and thus the 

deduced amino acid sequences have not been determined. Only 
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bands originating from Chlamydia pneumoniae proteins in 
general separated by SDS-PAGE are describe therein. 
However, the gene encoding this protein has not been 
determined before the present invention. Only a very weak or 
5 no reaction with patient sera can be observed to the 98 kDa 
protein (Campbell et al. 1990) and prior to the work of the 
present inventors it has not been recognized that the 89-101 
kDa proteins are surface exposed or that they in fact is 
immunogenic. In this report it is described that a number of 
10 human serum samples re acts wit h a C. pneumoniae -protein -that 
in SDS-PAGE migrate as 98 kDa. The protein was not further 
characterized and it is therefore not in conflict with the 
present application. 

Halme et al. (1997) described the presence of human T-cell 
15 epitopes in C. pneumoniae proteins of 92-98 kDa. The proteins 
were eluted from SDS-PAGE of total chlamydia proteins but the 
identity of the proteins were not determined. 

Use of antibodies to screen expression libraries is a well 
known method to clone fragments of genes encoding antigenic 
20 parts of proteins. However, since patient sera do not show a 
significant reaction with the 98 kDa protein it has not been 
possible to use patient serum to clone the proteins. 

It was known that monoclonal antibodies generated by the 
25 inventors reacted with conformational epitopes on the surface 
of C. pneumoniae and that they also reacted with C. 
pneumoniae OMC by immuno-electron microscopy (Christiansen et 
al. 1994). Furthermore, the 98 kDa protein is the only 
unknown protein from the C. pneumoniae OMC (Melgosa et al. 
30 1993) . The present inventors chose to take an unconventional 
step in order to clone the gene encoding the hitherto unknown 
98 kDa protein: C. pneumoniae OMC was purified and the highly 
immunogenic conformational epitopes were destroyed by SDS- 
treatment of the antigen before immunization. Thereby an 
35 antibody (PAB 150) to less immunogenic linear epitopes was 
obtained. This provided the possibility to obtain an 
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antiserum which could detect the protein, and it was shown 
that a gene family encoding the 89-101 kDa and 56 proteins 
according to the invention could be detected in colony 
blotting of recombinant E. coli. 

5 Mice infected with C. pneumoniae generate antibodies to the 
proteins identified by the inventors and named Omp4-15, but 
do not recognize the SDS treated heat denatured antigens 
normally used for SDS -PAGE and immunoblotting. However, a 
strong reaction was seen if the antigen was not heat 
10 denatured. It is therefore highly likely that if a similar 
reaction is seen in connection with human infections the 
antigens of the present invention will be of invaluable use 
in sero-diagnostic tests and may very likely be used as a 
vaccine for the prevention of infections. 

15 

By generating antibodies against COMC from C. pneumoniae a 
polyclonal antibody (PAB 150) was obtained which reacted with 
all the proteins. This antibody was used to identify the 
genes encoding the 89.6-101.3 kDa and 56.1 kDa proteins in an 

20 expression library of C. pneumoniae DNA. A problem in 
connection with the present invention was that a family 
comprising a number of similar genes were found in C. 
pneumoniae. Therefore, a large number of different clones 
were required to identify clusters of fragments. Only because 

25 the rabbit antibody generated by the use of SDS -denatured 
antigens contained antibodies to a high number of different 
epitopes positioned on different members of the protein 
family did the inventors succeed in cloning and sequencing 
four of the genes. One gene was fully sequenced, a second was 

30 sequenced except for the distal part and shorter fragments of 
two additional genes were obtained by this procedure. To 
obtain the DNA sequence of the additional genes and to search 
for more members of the gene family long range PCR with 
primers derived from the sequenced genes, and primers from 

35 the genes already published in the database were used. This 
approach gave rise to the detection of additional eight genes 
belonging to this family. The genes were situated in two gene 
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clusters: Ompl2, 11, 10, 5,4, 13 and 14 in one cluster and 
Omp6,7,8,9 and 15 in the second. Full sequence was obtained 
from Omp4,5,6,7,8,9,10,ll and 13, and partial sequence of 
Ompl2,14. Ompl3 was a truncated gene of 1545 nucleotides. The 
5 rest of the full length genes were from 2526 (0mp7) to 2838 
(0mpl5) nucleotides. The deduced amino acid sequences 
revealed putative polypeptides of 89.6 to 100.3 kDa, except 
for Ompl3 of 56.1 kDa. Alignment of the deduced amino acid 
sequences showed a maximum identity of 49% (Omp5/Omp9) when 
10 all the sequences were compared. Except for 0mpi3, the lowest 
homology was to Omp7 with no more than 34% identity to any of 
the other amino acid sequences. The scores for Ompl3 was from 
29-32% to all the other sequences. 

In the present context SEQ ID Nos. 1 and 2 correspond to 
15 Omp4, SEQ ID Nos 3 and 4 correspond to Omp5, SEQ ID Nos 5 and 
6 correspond to Omp6, SEQ ID Nos 7 and 8 correspond to Omp7, 
SEQ ID Nos 9 and 10 correspond to 0mp8, SEQ ID Nos 11 and 12 
correspond to Omp9, SEQ ID Nos 13 and 14 corresponds to 
OmplO, SEQ ID Nos 15 and 16 corresponds to Ompll, SEQ ID Nos 
20 17 and 18 corresponds to 0mpl2, SEQ ID Nos 19 and 20 

corresponds to Ompl3, SEQ ID Nos 21 and 22 corresponds to 
Ompl4, and SEQ ID Nos 23 and 24 corresponds to OmplS. 

The estimated size of the Omp proteins of the of the present 
invention are listed in the following. Omp 4 has a size of 

25 98.9 kDa, Omp5 has an estimated size of 97.2 kDa, Omp6 has an 
estimated size of 100.3 kDa, 0mp7 has an estimated size of 
89.7 kDa, Omp8 has an estimated size of 90.0 kDa, Omp9 has an 
estimated size of 96.7 kDa, OmplO has an estimated size of 
98.4 kDa, Ompll has an estimated size of 97.6 kDa, Ompl3 has 

30 an estimated size of 56.1 kDa, Omp 12 and 14 being partial. 

Furthermore, SEQ ID No 25 is a subsequence of SEQ ID No 3, 
SEQ ID No 26 is a subsequence of SEQ ID No 4, SEQ ID No 27 is 
a subsequence of SEQ ID No 5, SEQ ID No 28 is a subsequence 
of SEQ ID No 6, SEQ ID No 29 is a subsequence of SEQ ID No 7, 
35 and SEQ ID No 30 is a subsequence of SEQ ID No 8. 
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Part of the omp proteins were expressed as fusion proteins, 
and mice polyclonal monospecific antibodies against the 
proteins were produced. The antibodies reacted with the 
surface of C. pneumoniae in both immunofluorescence and 
5 immunoelectron microscopy. This shows for the first time that 
the 89-101 kDa and 56-57 kDa protein family in C. pneumoniae 
comprises surface exposed outer membrane proteins. This 
important finding leads to the realization that members of 
the 89-101 kDa and 56-57 kDa C. pneumoniae protein family are 

10 good candidates for the development of a sero diagnostic test 
for C. pneumoniae, as well as the development of a vaccine 
against infections with c. pneumoniae based on using these 
proteins. Furthermore, the proteins may be used as 
epidemiological markers, and polyclonal monospecific sera 

15 against the proteins can be used to detect C. pneumoniae in 
human tissue or detect C. pneumoniae isolates in tissue 
culture. Also, the genes encoding the 89-101 kDa and 56-57 
kDa such as the 89.6-100.3 kDa and 56.1 protein family may be 
used for the development of a species specific diagnostic 

20 test based on nucleic acid detection/amplification. 

The full length Omp4 was cloned into an expression vector 
system that allowed expression of the Omp4 polypeptide. This 
polypeptide was used as antigen for immunization of a rabbit. 
Since the protein was purified under denaturing condition the 
antibody did not react with the native surface of C. 
pneumoniae, but it reacted with a 98 kDa protein in 
immunoblotting where purified C. pneumoniae EB was used as 
antigen. Furthermore, the antibody reacted in paraffin 
embedded sections of lung tissue from experimentally infected 
mice . 

A broad aspect of the present invention relates to a species 
specific diagnostic test for infection of a mammal, such as a 
human, with Chlamydia pneumoniae, said test comprising 
detecting in a patient or preferable in a patient sample the 
35 presence of antibodies against proteins from the outer 

membrane of Chlamydia pneumoniae, said proteins being of a 



25 



30 
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molecular weight of 89-101 kDa or 56-57 kDa, or detecting the 
presence of nucleic acid fragments encoding said outer 
membrane proteins or fragments thereof. 

5 In the context of the present application, the term "patient 
sample" should be taken to mean an amount of serum from a 
patient, such as a human patient, or an amount of plasma from 
said patient, or an amount of mucosa from said patient, or an 
amount of tissue from said patient, or an amount of 

10 expectorate, forced sou**™™ o-r » VonoKi a .i o^^i* -^a*-^ ->~ -»~.~.-~+- 
of urine from said patient, or an amount of cerebrospinal 
fluid from said patient, or an amount of atherosclerotic 
lesion from said patient, or an amount of mucosal swaps from 
said patient, or an amount of cells from a tissue culture 

15 originating from said patient, or an amount of material which 
in any way originates from said patient. The in vivo test in 
a human according to the present invention includes a skin 
test known in the art such as an intradermal test, e.g 
similar to a Mantaux test. In certain patients being very 

20 sensitive to the test, such as is often the case with 
children, he test could be non-invasive, such as a 
superficial test on the skin, e.g. by use of a plaster 

In the present context, the term 89-101 kDa protein means 
proteins normally present in the outer membrane of Chlamydia 
25 pneumoniae, which in SDS-PAGE can be observed as one or more 
bands with an apparent molecular weight substantially in the 
range of 89-101 kDa. From the deduced amino acid sequences 
the molecular size varies from 89.6 to 100.3 kDa. 

Within the scope of the present invention are species 
30 specific sero-diagnostic tests based on the usage of the 

genes belonging to the gene family disclosed in the present 
application. 

Preferred embodiments of the present invention relate to 
species specific diagnostic tests according to the invention, 
35 wherein the outer membrane proteins have sequences selected 
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from the group consisting of SEQ ID NO: 2, SEQ ID NO: 4, SEQ 
ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID 
NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID 
NO: 22, and SEQ ID NO: 24. 

5 When used in connection with proteins according to the 

present invention the term "variant" should be understood as 
a sequence of amino acids which shows a sequence similarity 
of less than 100% to one of the proteins of the invention. A 
variant secruence can be of l-h*=> 

* — v* ^- k~ \~.CLxi v_ Ci 

10 different size as the sequence it is compared to. A variant 
will typically show a sequence similarity of preferably at 
least 50%, preferably at least 60%, more preferably at least 
70%, such as at least 80%, e.g. at least 90%, 95% or 98%. 

The term "sequence similarity" in connection with sequences 
15 of proteins of the invention means the percentage of 

identical and conservatively changed amino acid residues 
(with respect to both position and type) in the proteins of 
the invention and an aligned protein of equal of different 
length. The term "sequence identity" in connection with 
20 sequences of proteins of the invention means the percentage 
of identical amino acid with respect to both position and 
type in the proteins of the invention and an aligned protein 
of equal of different length. 

Within the scope of the present invention are subsequences of 
25 one of the proteins of the invention, meaning a consecutive 
stretch of amino acid residues taken from SEQ ID NO: 2, SEQ 
ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID 
NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID 
NO: 20, SEQ ID NO: 22 , or SEQ ID NO: 24. A subsequence will 
30 typically comprise at least 100 amino acids, preferably at 
least 80 amino acids, more preferably at least 70 amino 
acids, such as 50 amino acids. It might even be as small as 
10-50 amino acids, such as 20-40 amino acids, e.g. about 30 
amino acids. A subsequence will typically show a sequence 
35 homology of at least 50%, preferably at least 60%, more 
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preferably at least 70%, such as at least 80%, e.g. at least 
90%, 95% or 98%. 



Diagnostic tests according to the invention include 
immunoassays selected from the group consisting of a direct 
5 or indirect EIA such as an ELISA, an immunoblot technique 
such as a Western blot, a radio immuno assay, and any other 
non-enzyme linked antibody binding assay or procedure such as 
a fluorescence, agglutination or precipitation reaction, and 
nephelometry . 

10 A preferred embodiment of the present invention relates to 

species specific diagnostic tests according to the invention, 
said test comprising an ELISA, wherein antibodies against the 
proteins of the invention or fragments thereof are detected 
in samples. 



15 A preferred embodiment of the invention, is an ELISA based on 
detection in samples of antibodies against proteins of the 
invention. The ELISA may use proteins of the invention, or 
variants thereof, i.e. the antigen, as coating agent. An 
ELISA will typically be developed according to standard 

20 methods well known in the art, such as methods described in 
"Antibodies; a laboratory manual", Ed. David Lane Harlow, 
Cold Spring Habor laboratories (1988), which is hereby 
incorporated by reference. 

Recombinant proteins will be produced using DNA sequences 
25 obtained essentially using methods described in the examples 
below. Such DNA sequences, comprising the entire coding 
region of each gene in the gene family of the invention, will 
be cloned into an expression vector from which the deduced 
protein sequence can be purified. The purified proteins will 
30 be analyzed for reactivity in ELISA using both monoclonal and 
polyclonal antibodies as well as sera from experimentally 
infected mice and human patient sera. 
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From the experimentally infected mice sera it is known that 
non- linear epitopes are recognized predominantly. Thus, it is 
contemplated that different forms of purification schemes 
known in the art will be used to analyze for the presence of 
5 discontinuous epitopes, and to analyze whether the human 
immune response is also directed against such epitopes. 

Preferred embodiments of the present invention relate to 
species specific diagnostic tests according to the invention, 
wherein the nucleic acid -fragments -have -sequences selected 
10 from the group consisting of SEQ ID NO: 1, SEQ ID NO: 3, SEQ 
ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID 
NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID 
NO: 21, and SEQ ID NO: 23. 



In connection with nucleic acid fragments according to the 
15 present invention the term "variant" should be understood as 
a sequence of nucleic acids which shows a sequence homology 
of less than 100%. A variant sequence can be of the same size 
or it can be of a different size as the sequence it is 
compared to. A variant will typically show a sequence 
20 homology of at least 50%, preferably at least 60%, more 

preferably at least 70%, such as at least 80%, e.g. at least 
90%, 95% or 98%. 



The term "sequence homology" in connection with nucleic acid 
fragments of the invention means the percentage of matching 
25 nucleic acids (with respect to both position and type) in the 
nucleic acid fragments of the invention and an aligned 
nucleic acid fragment of equal or different length. 

In order to obtain information concerning the general 
distribution of each of the genes according to the present 
30 invention, PCR will be performed for each gene on all 
available C. pneumoniae isolates. This will provide 
information on the general variability of the genes or 
nucleic acid fragments of the invention. Variable regions 
will be sequenced. From patient samples PCR will be used to 
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amplify variable parts of the genes for epidemiology. Non- 
variable parts will be used for amplification by PGR and 
analyzed for possible use as a diagnostic test. It is 
contemplated that if variability is discovered, PCR of 
5 variable regions can be used for epidemiology. PCR of non- 
variable regions can be used as a species specific diagnostic 
test. Using genes encoding proteins known to be invariable in 
all known isolates prepared as targets for PCR to genes 
encoding proteins with unknown function. 

10 Particularly preferred embodiments of the present invention, 
relate to diagnostic tests according to the invention, 
wherein detection of nucleic acid fragments is obtained by 
using nucleic acid amplification, preferably polymerase chain 
reaction (PCR) . 

15 Within the scope of the present invention is a PCR based test 
directed at detecting nucleic acid fragments of the invention 
or variants thereof. A PCR test will typically be developed 
according to methods well known in the art and will typically 
comprise a PCR test capable of detecting and differentiating 

20 between nucleic acid fragments of the invention. Preferred 
are quantitative competitive PCR tests or nested PCR tests. 
The PCR test according to the invention will typically be 
developed according to methods described in detail in EP B 
540 588, EP A 586 112, EP A 643 140 OR EP A 669 401, which 

25 are hereby incorporated by reference. 

Within the scope of the present invention are variants and 
subsequences of one of the nucleic acid fragments of the 
invention, meaning a consecutive stretch of nucleic acids 
taken from SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID 

30 NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 
15, SEQ ID NO: 19, SEQ ID NO: 21, or SEQ ID NO: 23. A variant 
or subsequence will preferably comprise at least 100 nucleic 
acids, preferably at least 80 nucleic acids, more preferably 
at least 70 nucleic acids, such as at least 50 nucleic acids. 

35 It might even be as small as 10-50 nucleic acids, such as 
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20-40 nucleic acids, e.g. about 30 nucleic acids. A 
subsequence will typically show a sequence homology of at 
least 30%, preferably at least 60%, more preferably at least 
70%, such as at least 80%, e.g. at least 90%, 95% or 98%. The 
5 shorter the subsequence, the higher the required homology. 
Accordingly, a subsequence of 100 nucleic acids or lower must 
show a homology of at least 80%. 

A very important aspect of the present invention relates to 
proteins of the invention derived from Chlamydia pneumoniae 

10 having amino acid sequences selected from the group 

consisting of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ 
ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, and SEQ 
ID NO: 24 having a sequence similarity of at least 50%, 

15 preferably at least 60%, more preferably at least 70%, such 
as at least 80%, e.g. at least 90%, 95% or 98% and a similar 
biological function . 

By the term "similar biological function" is meant that the 
protein shows characteristics similar with the proteins 
20 derivable from the membrane proteins of Chlamydia pneumoniae. 
Such proteins comprise repeated motifs of GGAI (at least 2, 
preferable at least 3 repeats) and/or conserved positions of 
tryptophan, (w) . 

Comparison of the DNA sequences from genes encoding Omp4-15 
25 shows that the overall similarity between the individual 
genes ranges between 43-55%. Comparison of the amino acid 
sequences of Omp4-15 shows 34-49% identity and 53-64% 
similarity. The homology is generally scattered along the 
entire length of the deduced amino acids. However, as seen 
30 from figure 8 A - J there are some regions in which the 
homology is more pronounced. This is seen in the repeated 
sequence where the sequence GGAI is repeated 4-7 times in the 
genes. It is interesting that the DNA homology is not 
conserved for the sequences encoding the four amino acids 
35 GGAI. This may indicate a functional role of this part of the 
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protein and indicates that the repeated structure did not 
occur by a duplication of the gene. In addition to the four 
amino acid repeats GGAI a region from amino acid 400 to 490 
has a higher degree of homology than the rest of the protein, 
5 with the conserved sequence FYDPI occurring in all sequences. 
As further indication of similarity in function the amino 
acid tryptophan (W) is perfectly conserved at 4-6 
localizations in the C- terminal part of the protein. 

Since none of the genes and deduced amino acid sequences of 
10 the invention are identical the following is within the scope 
of the present invention; production of monospecific 
antibodies, the use of said antibodies for characterizing 
which C. pneumoniae proteins are expressed, the use of said 
antibodies for characterizing at which time during 
15 developmental life cycle said C. pneumoniae proteins are 

expressed, and the use of said antibodies for characterizing 
the precise cellular localization of said C. pneumoniae 
proteins. Also within the scope of the present invention is 
the use of monospecific antibodies against proteins of the 
20 invention for determining which part of said proteins is 
surface exposed and how proteins in the C. pneumoniae COMC 
interact with each other. 

Preferred embodiments of the present invention relate to 
25 polypeptides which comprise subsequences of the proteins of 
the invention, said subsequences comprising the sequence 
GGAI. Further preferred embodiments of the present invention 
relate to polypeptides which comprise subsequences of the 
proteins of the invention, said subsequences comprising the 
30 sequence FSGE. 

Polypeptides according to the invention will typically be of 
a length of at least 6 amino acids, preferably at least 15 
amino acids, preferably at least 20 amino acids, preferably 
at least 25 amino acids, preferably at least 3 0 amino acids, 
35 preferably at least 35 amino acids, preferably at least 4 0 
amino acids, preferably at least 45 amino acids, preferably 
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at least 50 amino acids, preferably at least 55 amino acids, 
preferably at least 100 amino acids. 

A very important aspect of the present invention relates to 
nucleic acid fragments of the invention derived from 
5 Chlamydia pneumoniae, variants and subsequences thereof. 

Another important aspect of the present invention relates to 
antibodies against the proteins according to the invention, 
such -antibodies including -po-lycl-ona-l monospecific anti-bodies- 
and monoclonal antibodies against proteins with sequences 
10 selected from the group consisting" of SEQ ID NO: 2, SEQ ID 

NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 
12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 
20, SEQ ID NO: 22, and SEQ ID NO: 24. 

A very important aspect of the present invention relates to 
15 diagnostic kits for the diagnosis of infection of a mammal, 
such as a human, with Chlamydia pneumoniae, said kits 
comprising one or more proteins with amino acid sequences 
selected from the group consisting of SEQ ID NO: 2, SEQ ID 
NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 
20 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 
20, SEQ ID NO: 22, and SEQ ID NO: 24. 

Another very important aspect of the present invention 
relates to diagnostic kits for the diagnosis of infection of 
a mammal, such as a human, with Chlamydia pneumoniae, said 

25 kits comprising antibodies against a protein with an amino 
acid sequence selected from the group consisting of SEQ ID 
NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 
10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 
18, SEQ ID NO: 20, SEQ ID NO: 22, and SEQ ID NO: 24. 

3 0 Antibodies included in a diagnostic kit according to the 
invention can be polyclonal or monoclonal or a mixture 
hereof . 
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Still another very important aspect of the present invention 
relates to diagnostic kits for the diagnosis of infection of 
a mammal, such as a human, with Chlamydia pneumoniae, said 
kits comprising one or more nucleic acid fragments with 
5 sequences selected from the group consisting of SEQ ID NO: 1, 
SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ 
ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ 
ID NO: 19, SEQ ID NO: 21, and SEQ ID NO: 23. 

An aspect of the present invention relates to a composition 
10 for immunizing a mammal, such as a human, against Chlamydia 
pneumoniae, said composition comprising one or more proteins 
with amino acid sequences selected from the group consisting 
of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, 
SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, 
15 SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, and SEQ ID NO: 
24. 

An important role for the proteins of the invention in 
prevention of infection of a mammal, such as a human, with C. 
pneumoniae is expected. Thus proteins of the invention, 

20 including variants and subsequences will be produced, 

typically by using recombinant techniques, and will then be 
used as an antigen in immunization of mammals, such as 
rabbits. Subsequently, the hyper immune sera obtained by the 
immunization will be analyzed for protection against C. 

25 pneumoniae infection using a tissue culture assay. In 

addition it is contemplated that monoclonal antibodies will 
be produced, typically using standard hybridoma techniques, 
and analyzed for protection against infection with C. 
pneumoniae . 

3 0 It is envisioned that particularly interesting and 

immunogenic epitopes will be found in connection with the 
proteins of the invention, which will comprise subsequences 
of said proteins. It is preferred to use polypeptides 
comprising such subsequences of the proteins of the invention 
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in immunizing a mammal, such as a human, against Chlamydia 
pneumoniae. 

An important aspect of the present invention relates to the 
use of proteins with sequences selected from the group 
5 consisting of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ 
ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, and SEQ 
ID NO: 24 in diagnosis of infection of a mammal, such as a 
human, with Chlamydia pneumoniae., 

10 A preferred embodiment of the present invention relates to 
the use of proteins according to the invention in an 
undenatured form, in diagnosis of infection of a mammal, such 
as a human, with Chlamydia pneumoniae. 

A very important aspect of the present invention relates to 
15 the use of proteins with sequences selected from the group 
consisting of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ 
ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, and SEQ 
ID NO: 24, for immunizing a mammal, such as a human, against 
20 Chlamydia pneumoniae. 

A preferred embodiment of the present invention relates to 
the use of proteins according to the invention in an 
undenatured form, for immunizing a mammal, such as a human, 
against Chlamydia pneumoniae. 

25 A very important aspect of the present invention relates to 
the use of nucleic acid fragments with nucleotide sequences 
selected from the group consisting of SEQ ID NO: 1, SEQ ID 
NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 
11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 

30 19, SEQ ID NO: 21, and SEQ ID NO: 23 for immunizing a mammal, 
such as a human, against Chlamydia pneumoniae. 
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It is envisioned that one type of vaccine against C. 
pneumoniae will be developed by using gene-gun vaccination of 
mice. Typically, different genetic constructs containing 
nucleic acid fragments, combinations of nucleic acid 
5 fragments according to the invention will be used in the 
gene -gun approach. The mice will then subsequently be 
analyzed for production of both humoral and cellular immune 
response and for protection against infection with C. 
pneumoniae after challenge herewith. 

10 In line with this, the invention also relates to the uses of 
the proteins of the invention as a pharmaceutical (a vaccine) 
as well as to the uses thereof for the preparation of a 
vaccine against infections with Chlamydia pneumoniae. 

Preparation of vaccines which contain protein sequences as 

15 active ingredients is generally well understood in the art, 
as exemplified by U.S. Patents 4,608,251; 4,601,903; 
4,599,231; 4,599,230; 4,596,792; and 4,578,770, all incorpor- 
ated herein by reference. Typically, such vaccines are pre- 
pared as injectables either as liquid solutions or suspen- 

20 sions; solid forms suitable for solution in, or suspension 
in, liquid prior to injection may also be prepared. The 
preparation may also be emulsified. The active immunogenic 
ingredient is often mixed with excipients which are pharma- 
ceutical^ acceptable and compatible with the active ingredi- 

25 ent. Suitable excipients are, for example, water, saline, 
dextrose, glycerol, ethanol, or the like, and combinations 
thereof. In addition, if desired, the vaccine may contain 
minor amounts of auxiliary substances such as wetting or 
emulsifying agents, pH buffering agents, or adjuvants which 

30 enhance the effectiveness of the vaccines. 

The vaccines are conventionally administered parenterally, by 
injection, for example, either subcutaneous ly or intramuscu- 
larly. Additional formulations which are suitable for other 
modes of administration include suppositories and, in some 
35 cases, oral formulations. These compositions take the form of 
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solutions, suspensions, tablets, pills, capsules, sustained 
release formulations or powders and contain 10-95% of active 
ingredient, preferably 25-70%, and optionally a suitable 
carrier . 



5 The protein sequences may be formulated into the vaccine as 
neutral or salt forms known in the art. The vaccines are 
administered in a manner compatible with the dosage 
formulation, and in such amount as will be therapeutically 
ef fective and immunogenic . The quantity to be administered 

10 depends on the subject to be treated. Suitable dosage ranges 
are of the order of several hundred micrograms active 
ingredient per vaccination with a preferred range from about 
0.1 /zg to 1000 jxg. The immune response may be enhanced if the 
vaccine further comprises an adjuvant substance as known in 

15 the art. Other possibilities involve the use of 

immunomodulating substances such as lymphokines (e.g. IFN-7, 
IL-2 and IL-12) or synthetic IFN-y inducers such as poly I:C 
in combination with the above-mentioned adjuvants. 

It is also possible to produce a living vaccine by introdu- 
20 cing, into a non -pathogenic microorganism, at least one 

nucleic acid fragment encoding a protein fragment or protein 
of the invention, and effecting expression of the protein 
fragment or the protein on the surface of the microorganism 
(e.g. in the form of a fusion protein including a membrane 
25 anchoring part or in the form of a slightly modified protein 
or protein fragment carrying a lipidation signal which allows 
anchoring in the membrane) . The skilled person will know how 
to adapt relevant expression systems for this purpose. 

Another part of the invention is based on the fact that 
30 recent research have revealed that a DNA fragment cloned in a 
vector, which is non-replicative in eukaryotic cells may be 
introduced into an animal (including a human being) by e.g. 
intramuscular injection or percutaneous administration (the 
so-called "gene gun" approach) . The DNA is taken up by e.g. 
35 muscle cells and the gene of interest is expressed by a 
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promoter which is functioning in eukaryotes, e.g. a viral 
promoter, and the gene product thereafter stimulates the 
immune system. These newly discovered methods are reviewed in 
Ulmer et al., 1993, which hereby is included by reference. 

5 Thus, a nucleic acid fragment encoding a protein or protein 
of the invention may be used for effecting in vivo expression 
of antigens, i.e. the nucleic acid fragments may be used in 
so-called DNA vaccines. Hence, the invention also relates to 
a vaccine comprising a -nucleic acid fragment encoding a 

10 protein fragment or a protein of the invention, the vaccine 
effecting in vivo r expression of antigen by an mammal, such as 
a human, to whom the vaccine has been administered, the 
amount of expressed antigen being effective to confer 
substantially increased resistance to infections with 

15 Chlamydia pneumoniae in an mammal, such as a human. 

The efficacy of such a "DNA vaccine" can possibly be enhanced 
by administering the gene encoding the expression product 
together with a DNA fragment encoding a protein which has the 
capability of modulating an immune response. For instance, a 

20 gene encoding lymphokine precursors or lymphokines (e.g. IFN- 
7, IL-2, or IL-12) could be administered together with the 
gene encoding the immunogenic protein fragment or protein, 
either by administering two separate DNA fragments or by 
administering both DNA fragments included in the same vector. 

25 It is also a possibility to administer DNA fragments compri- 
sing a multitude of nucleotide sequences which each encode 
relevant epitopes of the protein fragments and proteins 
disclosed herein so as to effect a continuous sensitization 
of the immune system with a broad spectrum of these epitopes. 

30 The following experimental non-limiting examples are intended 
to illustrate certain features and embodiments of the inven- 
tion. 
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Figure 1. The figure shows electron microscopy of negative 
stained purified C. pneumoniae EB (A) and purified OMC (B) . 

Figure 2. The figure shows silver stained 15% SDS-PAGE of 
purified EB and OMC. Lane 1, purified C. pneumoniae EB; lane 
2, C. pneumoniae OMC; lane 3, purified C. trachomatis EB; and 
lane 4 C. trachomatis OMC. 

Figure 3 . The figure shows immunoblotting of C. pneumoniae EB 
separated by 10% SDS-PAGE, transferred to nitrocellulose and 
reacted with rabbit anti C. pneumoniae OMC. 

Figure 4. The figure shows coomassie blue stained 7.5% 
SDS-PAGE of recombinant pEX that were detected by the rabbit 
anti C. pneumoniae serum. Arrow indicated the localization of 
the 117 kDa b-galactosidase protein. 

Figure 5. The figure shows immunoblotting of recombinant pEX 
colones detected by colony blotting separated by 7.5% 
SDS-PAGE and transferred to nitrocellulose and reacted with 
rabbit anti C. pneumoniae OMC. Lane 1 # seablue molecular 
weight standard. Lane 2-6 pEX clones cultivated at 42°C to 
induce the production of the b-galactosidase fusion proteins. 

Figure 6. The figure shows sequence strategy for 0mp4 and 
Omp5. Arrows indicates primers used for sequencing. 

Figure 7. C pneumoniae omp genes. The genes are arranged in 
two clusters. In cluster 1 Ompl2, 11, 10, 5 # 4, 13, and 14 
are found. In cluster 2 are found 0mp6, 7, 8, 9, and 15. 

Figure 8 A - J. The figure shows alignment of C. pneumoniae 
Omp4-15, using the program pileup in the GCG package. 

Figure 9. The figure shows immunofluorescence of C. 
pneumoniae infected HeLa, 72 hrs. after infection, reacted 
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with mouse monospecific anti-serum against pEX3-36 fusion 
protein. pEX3-36 is a part of the 0mp5 gene. 

Figure 10. The figure shows immunob lot ting of C. pneumoniae 
EE, lane 1-3 heated to 100°C in SDS-sample buffer, lane 4-6 
unheated. Lane 1 reacted with rabbit anti C. pneumoniae OMC; 
lane 2 and 4 pre-serum ; lane 3 and 5 polyclonal rabbit anti 
pEXl-1 fusion protein; lane 6 MAb 26.1. 

Figure. 11, The fiqure shows rmmnp^hiothjinrr ^-f n ^^^^1 
EB, lane 1-4 heated to lOOoC in SDS-sample buffer, lane 5-6 
unheated. Reacted with serum from C5 7 -black mice 14 days 
after infection with 10 7 CFU of C. pneumoniae. Lane 1 and 5 
mouse l; lane 2 and 6 mouse 2; lane 3 and 5 mouse 3; and lane 
4 and 8 mouse 4 . 

Figure 12. The figure shows immunohistochemistry analysis of 
mouse lung tissue with C. pneumoniae inclusions present both 
in the bronchial epithelium and in the lung parenchyma 
(arrows) . 
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Cloning of the genes encoding the 98/95 kDa C. pneumoniae 
COMC proteins 

Purification of C. pneumonia EBs and COMC 

5 C. pneumoniae was cultivated in HeLa cells. Cultivation was 
done according to the specifications of Miyashita and 
Matsumoto (1992) , with the modification that centrif ugation 
of supernatant and of the later precipitate and turbid bottom 
layer was carried out at 100/000 X g. The microorganism 

10 attached to the HeLa cells by 30 minutes of centrif ugation at 
1000 x g, after which the cells were incubated in RPMI 1640 
medium (Gibco BRL, Germany cat No. 51800-27), containing 5% 
foetal calf serum (FCS, Gibco BRL, Germany Cat No. 10106.169) 
gentamicin for two hours at 37°C in 5% C02 atmosphere. The 

15 medium was changed to medium that in addition contained 1 mg 
per ml of cycloheximide . After 48 hours of incubation a 
coverslip was removed from the cultures and the inclusion was 
tested with an antibody specific for C. pneumoniae (MAb 26.1) 
(Christiansen et al. 1994) and a monoclonal antibody specific 

20 for the species C. trachomatis (MAb 32.3, Loke diagnostics, 
Arhus Denmark) to ensure that no contamination with C. 
trachomatis had occurred. The HeLa cells were tested by 
Hoechst stain for Mycoplasma contamination as well as by 
culture in BEa and BEg medium (Freund et al., 1979) . Also the 

25 C. pneumoniae stocks were also tested for Mycoplasma 
contamination by cultivation in BEa and BEg medium. No 
contamination with c. trachomatis, Mycoplasmas or bacteria 
were detected in cultures or cells. 72 hours post-infection 
the monolayer was washed in PBS, the cells were loosened in 

30 PBS with a rubber policeman, and the Chlamydia were liberated 
from the host cell by sonication. The C. pneumoniae EBs and 
RBs were purified on discontinuous density gradients 
(Miyashita et al. (1992)). The purity of the Chlamydia EBs 
were verified by negative staining and electronmicroscopy 

35 (Figure 1), only particles of a size of 0.3 to 0.5 mm were 
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detected in agreement with the structure of C. pneumonia EBs . 
The purified Chlamydia EBs were subjected to sarkosyl 
extraction as described by Caldwell et al (1981) with the 
modification that a brief sonication was used to suspend the 
5 COMC. The purified COMC was tested by electronmicroscopy and 
negative staining (Figure 1) , where a folded outer membrane 
complex was seen. 

SDS-PAGE analysis of purified EBs and COMC 

The proteins from purified EBs and C. pneumoniae OMC were 
10 separated on 15% SDS-polyacrylamide gel, arid the gel was 
silver stained (Figure 2) , in lane 1 it is seen that the 
purified EBs contain major proteins of 100/95 kDa and a 
protein of 38 kDa, in the purified COMC (lane 2) these two 
protein groups are also dominant. In addition, proteins with 
15 a molecular weight of 62/60 kDa, 55 kDa, and 12 kDa have been 
enriched in the COMC preparation. When the purified C. 
pneumoniae EBs are compared to purified C. trachomatis EB 
(lane 3) it is seen that predominant protein in the C. 
trachomatis EB is the major outer membrane protein (MOMP) , 
20 and it is also the dominant band in the COMC preparation of 
C. trachomatis (lane 4), and Omp2 of 60/62 kDa as well as 
Omp3 at 12 kDa are seen in the preparation. However, no major 
bands with a size of 100/95 kDa are detected as in the C. 
pneumoniae COMC preparation. 

25 Production of rabbit polyclonal antibodies against C. 
pneumoniae COMC 

To ensure production of rabbit antibodies that would 
recognize all the C. pneumoniae proteins in immuno-blotting 
and colony-blotting 10 fxg of COMC antigen was dissolved in 20 
30 ill of SDS sample buffer and thereafter divided into 5 vials. 
The dissolved antigen was further diluted in one ml of PBS 
and one ml of Freund incomplete adjuvant (Difco laboratories, 
USA cat. No. 0639-60-6) and injected into the quadriceps 
muscle of a New Zealand white rabbit. The rabbit was given 
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three times intramuscular injections at an interval of one 
week, and after further three weeks the dissolved COMC 
protein, diluted in one ml PBS was injected intravenously, 
and the procedure was repeated two weeks later. Eleven weeks 
5 after the beginning of the immunization, the serum was 
obtained from the rabbit. Purified C. pneumoniae EEs were 
separated by SDS-PAGE, and the proteins were 
electrotransf erred to nitrocellulose membrane. The membrane 
was blocked and immunostained with the polyclonal COMC 
-10 antibody (Figure 3 ) ... The serum recognized proteins with a 

size of 100/95, 60 and 3 8 kDa in the EB preparation. This is 
in agreement with the sizes of the outer membrane proteins. 

Cloning of the COMC proteins 

Due to the cultivation of C. pneumoniae in HeLa cells, 

15 contaminating host cell DNA could be present in the EB 

preparations. Therefore, the purified EB preparations were 
treated with DNAse to remove contaminating DNA. The C. 
pneumoniae DNA was then purif ied by CsCl gradient 
centrifugation. The C. pneumoniae DNA was partially digested 

20 with Sau3A and the fractions containing DNA fragments with a 
size of approx. 0.5 to 4.0 kb were cloned into the expression 
vector system pEX (Boehringer, Germany cat. No. 1034 766, 
1034 774, 1034 782) . The pEX vector system has a 
0-galactosidase gene with multiple cloning sites in the 3 'end 

25 of the 0-galactosidase gene. Expression of the gene is 

regulated by the PR promoter, so the protein expression can 
be induced by elevating the temperature from 32 to 42°C. The 
colonies of recombinant bacteria were transferred to 
nitrocellulose membranes, and the temperature was increased 

30 to 42°C for two hours. The bacteria were lysed by placing the 
nitrocellulose membranes on filters soaked in 5% SDS. The 
colonies expressing outer membrane proteins were detected 
with the polyclonal antibody raised against C. pneumoniae 
COMC. The positive clones were cultivated in suspension and 

35 induced at 42°C for two hours. The protein profile of the 
clones were analysed by SDS - PAGE , and increases in the size 
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of the induced b-galactosidase were observed (Figure 4) . In 
addition, the proteins were electrotransf erred to 
nitrocellulose membranes, and the reaction with the 
polyclonal serum against COMC was confirmed (Figure 5) . 

5 Sequencing of positive COMC clones 

To characterize the pEX clones, the inserted C. pneumoniae 
DNA was sequenced. The resulting DNA sequences were searched 
against the prokaryotic sejjuences in the GenEmbl database. 
The search identified 6 clones as part of the 0mp2 gene, and 

10 2 clones as part of the Omp3 gene, and 2 clones as part of 
the MOMP gene, indicating that COMC proteins had been 
successfully cloned. Furthermore, 32 clones were obtained, 
containing DNA sequences not found in the GenEmbl database. 
These sequences could, however, be clustered in two contics 

15 of 6 and 4 clones, and three clones were identical. In 

addition 19 clones were found with no overlap to the contics 
(Figure 7) . To obtain more sequence data for the genes, C. 
pneumoniae DNA was totally digested with BamHI restriction 
enzyme, and the fragments were cloned into the vector 

20 pBluescript. The ligated DNA was electrotransf ormed into E. 
coli XLl-Blue and selected on plates containing Ampicillin. 
The recombinant bacterial colonies were transferred to a 
nitrocellulose membrane, and colony hybridisation was 
performed using the inserts of pEX l-l clone as a probe. A 

25 clone containing a single BamHI fragment of 4.5 kb was found, 
and the hybridisation to the probe was confirmed by Southern 
blotting. The insert of the clone was sequenced 
bi-directionally using synthetic primers for approx. each 3 00 
bp. The sequence of the BairiHI fragment made it possible to 

30 join the two contics of pEX clones. Totally, together with 
the pEX clones it was possible to assemble 6.5 kb DNA 
sequence, encoding two new COMC proteins. (Figure 6) 

Additional sequences were obtained by PCR performed on 
purified C. pneumoniae DNA with primers both from the known 
35 Omp genes and from other known genes. The obtained PCR 
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products were sequenced, The sequence organisation is shown 
in Fig. 7. Additional 8 Omp genes were detected. The 
alignment of the deduced amino acid sequences are shown in 
Fig. 8 A and B. 

5 Analysis of DNA sequence 

The DNA sequence encoding the Omp4-15 proteins with a size of 
89.6-100.3 kDa (and for 0mpl3 : 56.1 kDa) . Omp4 and 0mp5 were 
transcribed in opposite directions. Downstream 0mp4 a 
possible termination structure was located. The 3 'end of the 

10 0mp5 gene was not cloned due to the presence of the BamHI 
restriction enzyme site positioned within the gene. The 
translated DNA sequence of 0mp4 and 0mp5 was compared by use 
of the gap programme in the GCG package (Wisconsin package, 
version 8.1-UNIX, August 1995, sequence analysis software 

15 package) . The two genes had an amino acid identity of 41% 
(similarity 61%) , and a possible cleavage site for signal 
peptidase 1 was present at amino acid 17 in Omp4 and amino 
acid 25 in 0mp5 . When the amino acid sequence encoded by two 
other pEX clones were compared to the sequence of 0mp4 and 

20 Omp5 they also had amino acid homology to the genes. It is 
seen that the two clones have homology to the same area in 
the Omp4 and Omp5 proteins. Consequently, the pEX clones must 
have originated from two additional genes. Therefore these 
genes were named Omp6 and 0mp7. Similar analyses were 

25 performed with the other genes. In contrast to what was seen 
for Omp4 and 5 none of the other putative omp proteins had a 
cleavage site for signal peptides. 

EXAMPLE 2 

Polyclonal monospecific antibodies against pEX fusion 
30 proteins and full length recombination + Omp4 

To investigate the topology of the Omp4-7 proteins, 
representative pEX clones, were selected from each gene. The 
fusion proteins of /3-galactosidase/omp were induced, and the 
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proteins were partially purified as inclusion bodies. Balb/c 
mice were immunized three times intramuscular with the 
antigens at an interval of one week, and after six weeks the 
serum was obtained from the mice. HeLa cells were infected 
5 with the C. pneumoniae. 72 hours after the infection the 

mono-layers were fixed with 3.7% formaldehyde. This treatment 
makes the outer membrane of the Chlamydia impermeable for 
antibodies due to the extensive cross -linking of the outer 
membrane proteins by the formaldehyde. The HeLa cells were 

10 per meab il ized with 0,2% Triton X10„0 f -the ..monolayers were 
washed in PBS, then incubated with 20% (v/v) FCS to 
inactivate free radicals of the formaldehyde. The mice sera 
were diluted 1:100 PBS with 20% (v/v) FCS and incubated with 
the monolayers for half an hour. The monolayers were washed 

15 in PBS and secondary FITCH conjugated rabbit anti mouse serum 
was added for half an hour, and the monolayers were washed 
and mounted. Several of the antibodies reacted strongly with 
the EBs in the inclusions (Figure 9) . In spite of the 
formaldehyde fixation it could not be excluded that the 

20 surface of the EB was changed by the treatments, so that the 
antibodies could get access to the Omp4-7. Therefore, the 
reaction was confirmed by immuno-electron microscopy with the 
antibody raised against clone pEX3-36. Purified EB of C. 
pneumoniae were absorbed to carbon coated nickel grids. After 

25 the absorption the grids were washed with PBS and blocked in 
0.5% Ovalbumin dissolved in PBS. The antibodies were diluted 
1:100 in the same buffer and incubated for 30 minutes. The 
grids were washed in PBS. Rabbit anti mouse Ig conjugated 
with lOnm colloidal gold diluted in PBS containing 1% gelatin 

30 was added to the grids for half an hour. The grids were 
washed in 3 x PBS with 1% gelatin and 3 times in PBS, the 
grids were contrastained with 0.7% phospho tungstic acid. The 
grids were analysed in a Jeol 1010 electron microscope at 40 
kV. It was seen that the gold particles were covering the 

35 surface of the purified EB. Because the C. pneumoniae EBs 
were not exposed to any detergent or fixation under either 
the purification or the reaction with antibodies, these 
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results show that the cloned proteins have surface exposed 
epitopes. 

Polyclonal monospecific antibodies against Omp4 

The Omp4 gene was amplified by PCR with primers that 
5 contained LIC- sites, and the PCR product was cloned into the 
pET-30 LIC vector (Novagen) . The histidine tagged fusion 
protein was expressed by induction of the synthesis by IPTG 
and purified ovex a nickel column. The purified Omp4 protein 
was used for immunization of a rabbit (six times, 8 ^g each 
10 time) . 

Use of rabbit polyclonal antibodies to recombinant 0mp4 for 
detection of Chlam ydia pneumoniae in paraffin embedded 
sections 

The lungs of C. pneumoniae infected mice were obtained three 
15 days after intranasal infection. The tissue samples were 
fixed in 4% formaldehyde, paraffin embedded, sectioned and 
deparaf f inized prior to staining. The sections were incubated 
with the rabbit serum diluted 1:200 in TBS ( 150 mM NaCl, 
20mM Tris pH 7.5) for 3 0 min at room temperature. After wash 
20 two times in TBS the sections were incubated with the 

secondary antibody (biotinylated goat anti-rabbit antibodies) 
diluted 1:300 in TBS, followed by two times wash in TBS. The 
sections were stained with streptavidin-biotin complex 
(streptABComplex/AP, Dako) for 30 min washed and developed 
25 under microscopic inspection with chromagen + new fuchsin 

(Vector laboratories) . The sections were counter stained with 
Hematoxylin and analyzed ny microscopy. 

Immuno blotting analysis with hyperimmune monospecific rabbit 
anti- serum 

30 The insert of pEXl-1 clone was amplified by PCR using primers 
containing LIC sites. The PCR product could therefore be 
inserted in the pET-32 LIC vector (Novagen, UK cat No. 69076- 
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1) . Thereby the insert sequence of the pEXl-1 clone was 
expressed in the new vector as a fusion protein, the part of 
the fusion protein encoded by the pET-32 LIC vector had 6 
histidine residues in a row. The expression of the fusion 
5 protein was induced in this vector, and the fusion protein 
could be purified under denaturing condition on a Ni2+ column 
due to the high affinity of the histidine residues to 
divalent cations. The purified protein was used for 
immunization of a New Zealand white rabbit. After 6 times 

10 intramuscular and 2 times intravenous immunization the serum 
was obtained from the rabbit. Purified C. pneumoniae EB was 
dissolved in SDS- sample buffer. Half of the sample was heated 
to 100°C in the sample buffer, whereas the other half of the 
sample was not heated. The samples were separated by 

15 SDS-PAGE, and the proteins were transferred to 

nitrocellulose, the serum was reacted with the strips. With 
the samples heated to 100°C the serum recognized a high 
molecular weight band of approximately 98 kDa. This is in 
agreement with the predicted size of Omp5, of which the 

20 pEXl-1 clone is a part, however, when the antibody was 
reacted to the strip with unheated EB, the pattern was 
different. Now a band was seen with a size of 75 kDa, in 
addition weaker bands were observed above the band (Figure 
10) . These data demonstrate that Omp5 needs boiling in 

25 SDS-sample buffer to be fully denatured and migrate with a 
size as predicted from the gene product. When the samples 
were not boiled, the protein was not fully denatured and less 
SDS binds to the protein and it has a more globular structure 
that will migrate faster in the acrylamide gel. The band 

30 pattern looked identical to what was obtained with a 

monoclonal antibody (MAb 26.1) (lane 6), we earlier have 
described (Christiansen et al., 1994), reacting with the 
surface of C. pneumoniae EB, but the antibody do not react 
with the fully SDS denatured C. pneumoniae EB in 

35 immunoblotting. 



SUBSTITUTE SHEET (RULE 26) 



WO 98/58953 

32 

Experimental infection of C57 black mice 



PCT/DK98/00266 



Due to the realization of the altered migration of the 0mp4-7 
proteins without boiling, we chose to analyse antibodies 
against C. pneumoniae EEs after an experimental infection of 
5 mice. To obtain antibodies from an infection caused by C. 
pneumoniae, C57 black mice were inoculated intranasally with 
10 7 CFI of C. pneumoniae under a light ether anaesthesia. 
After 14 days of infection the serum samples were obtained 
and the lungs were, analysed for pathological changes. In two 

10 of the mice a severe pneumonia was observed in the lung 
sections, and in the third mouse only minor changes were 
observed. The serum from the mice was diluted 1:100 and 
reacted with purified EBs dissolved in sample buffer with and 
without boiling. In the preparations that had been heated to 

15 100 °C the sera from two of the mice reacted strongly with 
bands of 60/62 kDa and weaker bands of 55 kDa, but no 
reaction was observed with proteins of the size of Omp4-7 
(Figure 11) . However, when the sera were reacted with the 
preparation that had not been heated they all had a strong 

20 reaction with a broad band of an approximate size of 75 kDa. 
This is in agreement with the size of the Omp4-7 proteins in 
the unheated preparation. Therefore, it could be concluded 
that the epitopes of the Omp4-7 proteins recognized by the 
antibodies after a C. pneumoniae infection were discontinuous 

25 epitopes because the full denaturation of the antigen 
completely destroyed the epitopes. The 75 kDa protein 
observed in unheated samples is not Omp2 (Shown in 
immunoblotting with an Omp2 specific antibody) 

EXAMPLE 3 

30 Comparison of Omp4-7 of C. pneumoziiae with putative outer 
membrane proteins (POMP) of C. psittaci 

Longbottom et al. (1996) have published partial sequence from 
98 to 90 kDa proteins from C. psittaci. They have entered the 
full sequence of 5 genes in this family in the EMBL database. 
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They have named the genes "putative outer membrane proteins" 
(POMP) since their precise location was not determined. The 
family is composed of two genes that are completely 
identical, and two genes with high homology to these genes. 
5 They calculated a molecular size of 90 and 91 kDa. The 5th 
encode a protein of 98 kDa. The sequence of the Omp4-7 
proteins of C. pneumoniae were compared to the sequences of 
the C. Psittaci POMP proteins with the programme pileup in 
the GCG package. The amino acid homologies were in the range 

10 of 51-63%. It is seen that the C, pneumoniae Omp4-5 proteins 
are most related to the 98 kDa POMP protein of C. psittaci. 
Interestingly, the 98 kDa Cv psittaci POMP protein is more 
related to the C. pneumoniae genes than to the other C. 
psittaci genes. The repeated sequences of GGAI were conserved 

15 in the 98 kDa POMP protein, but only three GGAI repeats were 
present in the 90 and 91 kDa C. psittaci POMP proteins. For 
C. psittaci it has been shown that antibodies to these 
proteins seem to be protective for the infection. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT 

(A) NAME: Svend Birkelund 

(B) STREET: Dept. of Medical Microbiology and Immunology, 

University of Arhus 

(C) CITY: Arhus C 

(D) STATE OR PROVINCE: 

(E) COUNTRY: Denmark 

(F) POSTAL CODE: 8000 

(ii) TITLE OF THE INVENTION: Chlamydia pneumoniae anti 

gens 

(iii) NUMBER OF SEQUENCES : 30 

(iv) COMPUTER -READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(v) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3200 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 205... 2987 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

CAATGTCGAA GAGAGCACTA ACCAGGAAAA TTGCGATTTC ATAAACCCAC TTTATTATTA 60 
AATTCTTACT TGCGTCATAT AAAATAGAAA ACTCAGAGAG TCAAGATAAA AATTCTTGAC 120 
AGCTGTTTTG TCATCTTTAA CTTGATTTAC TTATTTTGTT TCTATATTGA TGCGAATAGT 180 
TCTCTAAAAA ACAAAAGCAT TACC ATG AAG ACT TCG ATT CCT TGG GTT TTA 231 

Met Lys Thr Ser lie Pro Trp Val Leu 
1 5 

GTT TCC TCC GTG TTA GCT TTC TCA TGT CAC CTA CAG TCA CTA GCT AAC 279 
Val Ser Ser Val Leu Ala Phe Ser Cys His Leu Gin Ser Leu Ala Asn 
10 15 20 25 
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GAG GAA CTT TTA TCA CCT GAT GAT AGC TTT AAT GGA AAT ATC GAT TCA 327 
Glu Glu Leu Leu Ser Pro Asp Asp Ser Phe Asn Gly Asn He Asp Ser 
30 35 40 

GGA ACG TTT ACT CCA AAA ACT TCA GCC ACA ACA TAT TCT CTA ACA GGA 375 
Gly Thr Phe Thr Pro Lys Thr Ser Ala Thr Thr Tyr Ser Leu Thr Gly 
45 50 55 

GAT GTC TTC TTT TAC GAG CCT GGA AAA GGC ACT CCC TTA TCT GAC AGT 423 
Asp Val Phe Phe Tyr Glu Pro Gly Lys Gly Thr Pro Leu Ser Asp Ser 
60 65 70 

TGT TTT AAG CAA ACC ACG GAC AAT CTT ACC TTC TTG GGG AAC GGT CAT 471 
Cys Phe Lys Gin Thr Thr Asp Asn Leu Thr Phe Leu Gly Asn Gly His 

ou o5 

AGC TTA ACG TTT GGC TTT ATA GAT GCT GGC ACT CAT GCA GGT GCT GCT 519 
Ser Leu Thr Phe Gly Phe lie Asp Ala Gly Thr His Ala Gly Ala Ala 
90 95 100 105 

GCA TCT ACA ACA GCA AAT AAG AAT CTT ACC TTC TCA GGG TTT TCC TTA ' 567 
Ala Ser Thr Thr Ala Asn Lys Asn Leu Thr Phe Ser Gly Phe Ser Leu 
HO us 120 

CTG AGT TTT GAT TCC TCT CCT AGC ACA ACG GTT ACT ACA GGT CAG GGA 615 
Leu Ser Phe Asp Ser Ser Pro Ser Thr Thr Val Thr Thr Gly Gin Gly 
125 130 135 

ACG CTT TCC TCA GCA GGA GGC GTA AAT TTA GAA AAT ATT CGT AAA CTT 663 
Thr Leu Ser Ser Ala Gly Gly Val Asn Leu Glu Asn He Arg Lys Leu 
140 145 150 

GTA GTT GCT GGG AAT TTT TCT ACT GCA GAT GGT GGA GCT ATC AAA GGA 711 
Val Val Ala Gly Asn Phe Ser Thr Ala Asp Gly Gly Ala He Lys Gly 
155 160 165 

GCG TCT TTC CTT TTA ACT GGC ACT TCT GGA GAT GCT CTT TTT AGT AAC 759 
Ala Ser Phe Leu Leu Thr Gly Thr Ser Gly Asp Ala Leu Phe Ser Asn 
17 0 175 180 185 

AAC TCT TCA TCA ACA AAG GGA GGA GCA ATT GCT ACT ACA GCA GGC GCT 807 
Asn Ser Ser Ser Thr Lys Gly Gly Ala He Ala Thr Thr Ala Gly Ala 
190 195 200 

CGC ATA GCA AAT AAC ACA GGT TAT GTT AGA TTC CTA TCT AAC ATA GCG 855 
Arg He Ala Asn Asn Thr Gly Tyr Val Arg Phe Leu Ser Asn He Ala 
205 210 215 

TCT ACG TCA GGA GGC GCT ATC GAT GAT GAA GGC ACG TCG ATA CTA TCG 903 
Ser Thr Ser Gly Gly Ala He Asp Asp Glu Gly Thr Ser He Leu Ser 
220 225 230 

AAC AAC AAA TTT CTA TAT TTT GAA GGG AAT GCA GCG AAA ACT ACT GGC 951 
Asn Asn Lys Phe Leu Tyr Phe Glu Gly Asn Ala Ala Lys Thr Thr Gly 
235 240 245 

GGT GCG ATC TGC AAC ACC AAG GCG AGT GGA TCT CCT GAA CTG ATA ATC 999 
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Gly Ala He Cys Asn Thr Lys Ala Ser Gly Ser Pro Glu Leu He He 
250 255 260 265 

TCT AAC AAT AAG ACT CTG ATC TTT GCT TCA AAC GTA GCA GAA ACA AGC 1047 
Ser Asn Asn Lys Thr Leu He Phe Ala Ser Asn Val Ala Glu Thr Ser 
270 275 280 

GGT GGC GCC ATC CAT GCT AAA AAG CTA GCC CTT TCC TCT GGA GGC TTT 1095 
Gly Gly Ala He His Ala Lys Lys Leu Ala Leu Ser Ser Gly Gly Phe 
285 290 295 

ACA GAG TTT CTA CGA AAT AAT GTC TCA TCA GCA ACT CCT AAG GGG GGT 1143 
Thr Glu Phe Leu Arg Asn Asn Val Ser Ser Ala Thr Pro Lys Gly Gly 
300 305 310 

GCT ATC AGC ATC GAT GCC TCA GGA GAG CTC AGT CTT TCT GCA GAG ACA 1191 
Ala He Ser He Asp Ala Ser Gly Glu Leu Ser Leu Ser Ala Glu Thr 
315 ~ 320 325 

GGA AAC ATT ACC TTT GTA AGA AAT ACC CTT ACA ACA ACC GGA AGT ACC 123 9 
Gly Asn lie Thr Phe Val Arg Asn Thr Leu Thr Thr Thr Gly Ser Thr 
330 335 340 3 45 

GAT ACT CCT AAA CGT AAT GCG ATC AAC ATA GGA AGT AAC GGG AAA TTC 1287 
Asp Thr Pro Lys Arg Asn Ala He Asn lie Gly Ser Asn Gly Lys Phe 
350 355 360 

ACG GAA TTA CGG GCT GCT AAA AAT CAT ACA ATT TTC TTC TAT GAT CCC 1335 
Thr Glu Leu Arg Ala Ala Lys Asn His Thr lie Phe Phe Tyr Asp Pro 
365 370 375 

ATC ACT TCA GAA GGA ACC TCA TCA GAC GTA TTG AAG ATA AAT AAC GGC 1383 
lie Thr Ser Glu Gly Thr Ser Ser Asp Val Leu Lys lie Asn Asn Gly 
380 385 390 

TCT GCG GGA GCT CTC AAT CCA TAT CAA GGA ACG ATT CTA TTT TCT GGA 1431 
Ser Ala Gly Ala Leu Asn Pro Tyr Gin Gly Thr He Leu Phe Ser Gly 
395 400 405 

GAA ACC CTA ACA GCA GAT GAA CTT AAA GTT GCT GAC AAT TTA AAA TCT 1479 
Glu Thr Leu Thr Ala Asp Glu Leu Lys Val Ala Asp Asn Leu Lys Ser 
410 415 420 425 

TCA TTC ACG CAG CCA GTC TCC CTA TCC GGA GGA AAG TTA TTG CTA CAA 1527 
Ser Phe Thr Gin Pro Val Ser Leu Ser Gly Gly Lys Leu Leu Leu Gin 
430 435 440 

AAG GGA GTC ACT TTA GAG AGC ACG AGC TTC TCT CAA GAG GCC GGT TCT 1575 
Lys Gly Val Thr Leu Glu Ser Thr Ser Phe Ser Gin Glu Ala Gly Ser 
445 450 455 

CTC CTC GGC ATG GAT TCA GGA ACG ACA TTA TCA ACT ACA GCT GGG AGT 1623 
Leu Leu Gly Met Asp Ser Gly Thr Thr Leu Ser Thr Thr Ala Gly Ser 
460 465 470 

ATT ACA ATC ACG AAC CTA GGA ATC AAT GTT GAC TCC TTA GGT CTT AAG 1671 
lie Thr He Thr Asn Leu Gly lie Asn Val Asp Ser Leu Gly Leu Lys 
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475 480 485 

CAG CCC GTC AGC CTA ACA GCA AAA GGT GCT TCA AAT AAA GTG ATC GTA 1719 
Gin Pro Val Ser Leu Thr Ala Lys Gly Ala Ser Asn Lys Val He Val 
490 495 500 505 

TCT GGG AAG CTC AAC CTG ATT GAT ATT GAA GGG AAC ATT TAT GAA AGT 1767 
Ser Gly Lys Leu Asn Leu He Asp He Glu Gly Asn He Tyr Glu Ser 
510 515 520 

CAT ATG TTC AGC CAT GAC CAG CTC TTC TCT CTA TTA AAA ATC ACG GTT 1815 
His Met Phe Ser His Asp Gin Leu Phe Ser Leu Leu Lys He Thr Val 
525 530 535 

Gbt firr rvrrr m\T 7\n<r> nnm <~>-*s-i »m<-i -» r,r* . . ^ 

— - ™~ a uii otrt\^ -kGC aGl en hiu uur UTI lb 6 3 

Asp Ala Asp Val Asp Thr Asn Val Asp He Ser Ser Leu He Pro Val 

540 545 550 

CCT GCT GAG GAT CCT AAT TCA GAA TAC GGA TTC CAA GGA CAA TGG AAT 1911 
Pro Ala Glu Asp Pro Asn Ser Glu Tyr Gly Phe Gin Gly Gin Trp Asn 
555 560 565 

GTT AAT TGG ACT ACG GAT ACA GCT ACA AAT ACA AAA GAG GCC ACG GCA 1959 
Val Asn Trp Thr Thr Asp Thr Ala Thr Asn Thr Lys Glu Ala Thr Ala 
570 575 580 585 

ACT TGG ACC AAA ACA GGA TTT GTT CCC AGC CCC GAA AGA AAA TCT GCG 2007 
Thr Trp Thr Lys Thr Gly Phe Val Pro Ser Pro Glu Arg Lys Ser Ala 
590 595 600 

TTA GTA TGC AAT ACC CTA TGG GGA GTC TTT ACT GAC ATT CGC TCT CTG 2055 
Leu Val Cys Asn Thr Leu Trp Gly Val Phe Thr Asp He Arg Ser Leu 
605 610 615 

CAA CAG CTT GTA GAG ATC GGC GCA ACT GGT ATG GAA CAC AAA CAA GGT 2103 
Gin Gin Leu Val Glu He Gly Ala Thr Gly Met Glu His Lys Gin Gly 
620 625 630 

TTC TGG GTT TCC TCC ATG ACG AAC TTC CTG CAT AAG ACT GGA GAT GAA 2151 
Phe Trp Val Ser Ser Met Thr Asn Phe Leu His Lys Thr Gly Asp Glu 
635 640 645 

AAT CGC AAA GGC TTC CGT CAT ACC TCT GGA GGC TAC GTC ATC GGT GGA 2199 
Asn Arg Lys Gly Phe Arg His Thr Ser Gly Gly Tyr Val He Gly Gly 
650 655 660 665 

AGT GCT CAC ACT CCT AAA GAC GAC CTA TTT ACC TTT GCG TTC TGC CAT 2247 
Ser Ala His Thr Pro Lys Asp Asp Leu Phe Thr Phe Ala Phe Cys His 
670 675 680 

CTC TTT GCT AGA GAC AAA GAT TGT TTT ATC GCT CAC AAC AAC TCT AGA 2295 
Leu Phe Ala Arg Asp Lys Asp Cys Phe He Ala His Asn Asn Ser Arg 
685 690 695 

ACC TAC GGT GGA ACT TTA TTC TTC AAG CAC TCT CAT ACC CTA CAA CCC 2343 
Thr Tyr Gly Gly Thr Leu Phe Phe Lys His Ser His Thr Leu Gin Pro 
700 705 710 
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CAA AAC TAT TTG AGA TTA GGA AGA GCA AAG TTT TCT GAA TCA GCT ATA 
Gin Asn Tyr Leu Arg Leu Gly Arg Ala Lys Phe Ser Glu Ser Ala lie 
715 720 725 



2391 



GAA AAA TTC CCT AGG GAA ATT CCC CTA GCC TTG GAT GTC CAA GTT TCG 
Glu Lys Phe Pro Arg Glu lie Pro Leu Ala Leu Asp Val Gin Val Ser 
730 735 740 745 



770 



CTA GAC CTT CCT TTT GTT CTT TCC AAC CCA CAT CCT CTT TTC AAG ACC 
Leu Asp Leu Pro Phe Val Leu Ser Asn Pro His Pro Leu Phe Lys Thr 
780 785 790 



2439 



TTC AGC CAT TCA GAC AAC CGT ATG GAA ACG CAC TAT ACC TCA TTG CCA 2487 
Phe Ser His Ser Asp Asn Arg Met Glu Thr His Tyr Thr Ser Leu Pro 
750 755 760 

GAA TCC GAA GGT TCT TGG AGC AAC GAG TGT ATA GCT GGT GGT ATC GGC 2535 
Glu Ser Glu Gly Ser Trp Ser Asn Glu Cys He Ala Gly Gly He Gly 



2583 



TTC ATT CCA CAG ATG AAA GTC GAA ATG GTT TAT GTA TCA CAA AAT AGC 
Phe He Pro Gin Met Lys Val Glu Met Val Tyr Val Ser Gin Asn Ser 
795 800 805 



GGA GAT TCC TAC ACC TAT GAT CTC TCA GGA TTC TTT GTT TCC GAT GTC 
Gly Asp Ser Tyr Thr Tyr Asp Leu Ser Gly Phe Phe Val Ser Asp Val 
845 850 855 



2631 



TTC TTC GAA AGC TCT AGT GAT GGC CGT GGT TTT AGT ATT GGA AGG CTG 2679 
Phe Phe Glu Ser Ser Ser Asp Gly Arg Gly Phe Ser He Gly Arg Leu 
810 815 820 825 

CTT AAC CTC TCG ATT CCT GTG GGT GCG AAA TTC GTG CAG GGG GAT ATC 2727 
Leu Asn Leu Ser He Pro Val Gly Ala Lys Phe Val Gin Gly Asp He 
830 835 840 



2775 



TAT CGT AAC AAT CCC CAA TCT ACA GCG ACT CTT GTG ATG AGC CCA GAC 
Tyr Arg Asn Asn Pro Gin Ser Thr Ala Thr Leu Val Met Ser Pro Asp 
860 865 870 



2823 



TCT TGG AAA ATT CGC GGT GGC AAT CTT TCA AGA CAG GCA TTT TTA CTG 
Ser Trp Lys He Arg Gly Gly Asn Leu Ser Arg Gin Ala Phe Leu Leu 
875 880 885 



2871 



AGG GGT AGC AAC AAC TAC GTC TAC AAC TCC AAT TGT GAG CTC TTC GGA 2919 
Arg Gly Ser Asn Asn Tyr Val Tyr Asn Ser Asn Cys Glu Leu Phe Gly 
890 895 900 905 

CAT TAC GCT ATG GAA CTC CGT GGA TCT TCA AGG AAC TAC AAT GTA GAT 2967 
His Tyr Ala Met Glu Leu Arg Gly Ser Ser Arg Asn Tyr Asn Val Asp 
910 915 920 

GTT GGT ACC AAA CTC CGA TT CTAGATTGCT AAAACTCCCT AGTTCTTCTA GGGAG 3022 
Val Gly Thr Lys Leu Arg Phe 
925 



TTTTCTCATA CTTTTAGGGA AATATTTGCT ATAGGGAATG CTTTCCTTGC AAACTGTAAA 3082 
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AAATAACATT TGTCCCTCTT CAAAAAAGAT TTCTTTTAAT AATTTCTAGT TATAATTTTA 
TTTTAAAAAC AGTTAAATAA TTAATAGACA ATAATCTATT CTTATTGACT TCTTTTTT 



(2) INFORMATION FOR SEQ ID NO:2; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 928 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 



ncL 


Lys 


inr 


Ser 


lie 


Pro 


Trp 


Val 


Leu 


Val 


Ser 


Ser 


Val 


Leu 


Ala 


Phe 










c 
D 










10 










15 




Ser 


v_ys 


Hi q 
rl-L a 


Leu 


oin 


Ser 


Leu 


Ala 


Asn 


Glu 


Glu 


Leu 


Leu 


Ser 


Pro 


Asp 


















25 










30 




Asp 


Ser 




Asn 


uly 


Asn 


lie 


Asp 


Ser 


Gly 


Thr 


Phe 


Thr 


Pro 


Lys 


Thr 






35 




















45 






Ser 


Ala 


Thr 


Th-r 


ryr 


Ser 


Leu 


Thr 


Gly 


Asp 


Val 


Phe 


Phe 


Tyr 


Glu 


Pro 




50 




















60 








Glv 


Lys 


Glv 


Thr 




Leu 


Ser 


Asp 


ser 


Cys 


Phe 


Lys 


Gin 


Thr 


Thr 


Asp 


65 










70 




















80 


Asn 


Leu 


Thr 


Phe 


Leu 


m v 


Asn 


oiy 


HIS 


Ser 


Leu 


Thr 


Phe 


Gly 


Phe 


He 










85 


















95 




Asp 


Ala 


Glv 


Thr 




i-LXcx 




TV "1 _ 

Aid 


Ala 


Ala 


Ser 


Thr 


Thr 


Ala 


Asn 


Lys 








100 










105 










110 




Asn 


Leu 


Thr 
115 


Phe 


Ser 


Gly 


Phe 


Ser 
120 


Leu 


Leu 


Ser 


Phe 


Asp 
125 


Ser 


Ser 


Pro 


Ser 


Thr 


Thr 


Val 


Thr 


Thr 


Gly 


Gin 


Gly 


Thr 


Leu 


Ser 


Ser 


Ala 


Gly 


Gly 




130 










135 










140 






Val 


Asn 


Leu 


Glu 


Asn 


He 


Arg 


Lys 


Leu 


Val 


Val 


Ala 


Gly 


Asn 


Phe 


Ser 


145 










150 










155 








160 


Thr 


Ala 


Asp 


Gly 


Gly 


Ala 


He 


Lys 


Gly 


Ala 


Ser 


Phe 


Leu 


Leu 


Thr 


Gly 


Thr 








165 










170 










175 


Ser 


Gly 


Asp 


Ala 


Leu 


Phe 


Ser 


Asn 


Asn 


Ser 


Ser 


Ser 


Thr 


Lys 


Gly 


Gly 






180 










185 










190 


Ala 


He 


Ala 


Thr 


Thr 


Ala 


Gly 


Ala 


Arg 


He 


Ala 


Asn 


Asn 


Thr 


Gly 






195 










200 










205 






Tyr 


Val 


Arg 


Phe 


Leu 


Ser 


Asn 


He 


Ala 


Ser 


Thr 


Ser 


Gly 


Gly 


Ala 


He 




210 










215 










220 






Asp 


Asp 


Glu 


Gly 


Thr 


Ser 


He 


Leu 


Ser 


Asn 


Asn 


Lys 


Phe 


Leu 


Tyr 


Phe 


225 










230 










235 








240 


Glu 


Gly 


Asn 


Ala 


Ala 


Lys 


Thr 


Thr 


Gly 


Gly 


Ala 


He 


Cys 


Asn 


Thr 


Lys 


Ala 








245 










250 










255 


Ser 


Gly 


Ser 


Pro 


Glu 


Leu 


He 


He 


Ser 


Asn 


Asn 


Lys 


Thr 


Leu 


He 








260 










265 








270 






Phe 


Ala 


Ser 


Asn 


Val 


Ala 


Glu 


Thr 


Ser 


Gly 


Gly 


Ala 


He 


His 


Ala 


Lys 






275 










280 










285 






Lys 


Leu 


Ala 


Leu 


Ser 


Ser 


Gly 


Gly 


Phe 


Thr 


Glu 


Phe 


Leu 


Arg 


Asn 


Asn 




290 










295 










300 








Val 


Ser 


Ser 


Ala 


Thr 


Pro 


Lys 


Gly 


Gly 


Ala 


He 


Ser 


He 


Asp 


Ala 


Ser 
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305 








310 










315 










320 


Gly Glu 


Leu 


Ser 


Leu 


Ser 


Ala 


Glu 


Thr 


Glv 


Asn 


lie 


Thr 

X 11X> 


XrXIC 


vax 


Arg 








325 










330 










335 


Asn Thr 


Leu 


Thr 
340 


Thr 


Thr 


Glv 


Ser 


Thr 
345 


Asp 


Thr 


Pro 


Lys 


A tvt 

Arg 
350 


Asn 


Al a. 


He Asn 


He 


Gly 


Ser 


Asn 


Glv 


Lys 


Phe 


Thr 


Glu 


J-lGU 


Arg 


Al a 


Al a 

Ala 


Lys 




355 










360 
















Asn His 


Thr 


He 


Phe 


Phe 


Tvr 


Asp 


Pro 


He 


Thr 


OCX 


m ii 

V3XIX 


\j±. y 


inr 


Cor 

oer 


370 










375 










380 








Ser Asp 


Val 


Leu 


Lys 


He 


Asn 


Asn 


v?x y 


OCX. 


Al a 


uly 


ax a 


Leu 


Asn 


Pro 


385 








390 










395 










400 


Tyr Gin 


Glv 


Thr 


He 
405 


Leu 


Phe 


OCl 


uiy 


410 


hit 


Leu 


inr 


Ala 


Asp 

A t C 
413 


Glu 


Leu Lys 


Val 


Ala 


Asp Asn 


Leu 


Lys 


Cpy 
OC1 


OCi 


JrXlC 


inx 


oxn 


Pro 


vai 


Ser 






420 










-425 










4 - 30 






Leu Ser 


Glv 
435 


Glv 


Lys 


Leu 


Leu 


JUC U 

440 


fil Tl 
V7J.IL 


Lys 


jjiy 


17a 1 
Val 


rpU-. 

inr 
445 


Leu 


GlU 


Ser 


Thr Ser 


Phe 


Ser 


Gin Glu Ala 


m v 

oiy 


Ser 


Leu 


Leu 


ni.V 

vjiy 


Met 


Asp 


Ser 


Gly 


450 










455 










AC f\ 








Thr Thr 


Leu 


Ser 


Thr 


Thr 


Ala 




Ser 


116 


1*1-1 V 

inr 


Tl - 

lie 


Thr 


Asn 


Leu 


Gly 


465 








470 










fl / D 










480 


He Asn 

-A V* full 


Val 


Asp 


Ser 


Leu Gly 


Leu 


Lys 


<jin 


Pro 


Val 


Ser 


Leu 


Thr 


Ala 








485 










a on 










495 




Lys Gly 


Ala 


Ser 
500 


Asn 


Lys 


Val 


IXC 


17a 1 

vax 


OCI 


uiy 


Lys 


Leu 


Asn 

DXU 


Leu 


He 


Asp He 


Glu 


Glv 


Asn 


lie 


Tyr 


V7X IX 


Ct»Y" 

OCI 


nlS 


wec 


pne 


Ser 


His 


Asp 


Gin 




515 










520 










enc 






Leu Phe 


Ser 


Leu 


Leu 


Lys 


He 


XXIX 


17a 1 
Val 


Asp 


Aia 


ASp 


val 


Asp 


Thr 


Asn 


530 










535 










34 u 








Val Asp 


He 


Ser 


Ser 


Leu 


He 




17a 1 
Val 


Pro 


AT a 

A_La 


CjIU 


Asp 


Pro 


Asn 


Ser 


545 








550 










ODD 








560 


Glu Tyr 


Glv 


Phe 


Gin Gly Gin 


Trp 


Asn 


Vdl 


Asn 


Trp 


Thr 


Thr 


Asp 


Thr 








565 










C7n 

J / u 










enr 

o fo 




Ala Thr 


Asn 


Thr 


Lys 


Glu 


Ala 


Thr 


Al a 


X XIX 


Trp 


TVt-*- 
lfli 


Lys 


inr 


**<1 -mr 

L»iy 


Pne 






580 










585 










con 




Val Pro 


Ser 


Pro 


Glu Arg Lys 


Ser 


a. 


Leu 


17a 1 


Cys 


Asn 


inr 


Leu 


Trp 




595 










600 










DUj 






Gly Val 


Phe 


Thr 


Asp He Arg 


OCi. 


Lieu 


m n 

OJ.Il 


uin 


Leu 


17= 1 

vax 


ulu 


lie 


Gly 


610 










615 










620 








Ala Thr 


Glv 


Met 


Glu 


His 


Lys 


Gin 


ui y 




Trp 


17a 1 

vai 


Ser 


Ser 


Met 


Thr 


625 








630 










\J J u 












Asn Phe 


Leu 


His 


Lys Thr Gly 


Ren 
nap 


m ii 

ui LI 


Asn 


Arg 


Lys 


(jiy 


pne 


Arg 


His 








645 










650 










ODD 




Thr Ser 


Gly 


Glv 


Tyr Val 


He 


Glv 


Gly 


Ser 


Ala 


Hi e 


1 XIX 


Pro 


Lys 


Asp 






660 










665 










670 




Asp Leu 


Phe 


Thr 


Phe 


Ala 


Phe 


Cys 


His 


Leu 


Phe 


Ala 


Arg 


Asp 


Lys 


Asp 




675 










680 










fins 






Cys Phe 


He 


Ala 


His 


Asn 


Asn 


Ser 


Arg 


Thr 


Tvr* 

xyx 


vjx y 


f2l \r 

uiy 


inr 


Leu 


pne 


690 










695 










700 










Phe Lys 


His 


Ser 


His 


Thr 


Leu 


Gin 


Pro 


Gin 


Asn 


Tv-r 


Leu 


Arg 


Leu 


uiy 


705 








710 










715 








720 


Arg Ala 


Lys 


Phe 


Ser 


Glu 


Ser 


Ala 


lie 


Glu 


Lys 


Phe 


Pro 


Arg 


Glu 


lie 








725 










730 








735 




Pro Leu 


Ala 


Leu 


Asp Val 


Gin 


Val 


Ser 


Phe 


Ser 


His 


Ser 


Asp 


Asn 


Arg 






740 










745 










750 




Met Glu 


Thr 


His 


Tyr Thr Ser 


Leu 


Pro 


Glu 


Ser 


Glu 


Gly 


Ser 


Trp 


Ser 




755 










760 










765 
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Asn Glu Cys He Ala Gly Gly He Gly Leu Asp Leu Pro Phe Val Leu 

770 775 780 

Ser Asn Pro His Pro Leu Phe Lys Thr Phe He Pro Gin Met Lys Val 
785 790 795 800 

Glu Met Val Tyr Val Ser Gin Asn Ser Phe Phe Glu Ser Ser Ser Asp 

805 810 815 

Gly Arg Gly Phe Ser He Gly Arg Leu Leu Asn Leu Ser He Pro Val 

820 825 830 

Gly Ala Lys Phe Val Gin Gly Asp He Gly Asp Ser Tyr Thr Tyr Asp 

835 840 845 

Leu Ser Gly Phe Phe Val Ser Asp Val Tyr Arg Asn Asn Pro Gin Ser 

850 855 860 

Thr Ala Thr Leu Val Met Ser Pro Asp Ser Trp Lys He Arg Gly Gly 
865 870 875 880 

Asn Leu Ser Arg Gin Ala Phe Leu Leu Arg Giy Ser Asn Asn Tyr Val 

885 890 895 

Tyr Asn Ser Asn Cys Glu Leu Phe Gly His Tyr Ala Met Glu Leu Arg 

900 905 910 

Gly Ser Ser Arg Asn Tyr Asn Val Asp Val Gly Thr Lys Leu Arg Phe 



(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2815 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

ATGAAATCGC AATTTTCCTG GTTAGTGCTC TCTTCGACAT TGGCATGTTT TACTAGTTGT 60 

TCCACTGTTT TTGCTGCAAC TGCTGAAAAT ATAGGCCCCT CTGATAGCTT TGACGGAAGT 120 

ACTAACACAG GCACCTATAC TCCTAAAAAT ACGACTACTG GAATAGACTA TACTCTGACA 180 

GGAGATATAA CTC TGCA AAA CCTTGGGGAT TCGGCAGCTT TAACGAAGGG TTGTTTTTCT 240 

GACACTACGG AATCTTTAAG CTTTGCCGGT AAGGGGTACT CACTTTCTTT TTTAAATATT 300 

AAGTCTAGTG CTGAAGGCGC AGCACTTTCT GTTACAACTG ATAAAAATCT GTCGCTAACA 360 

GGATTTTCGA GTCTTACTTT CTTAGCGGCC CCATCATCGG TAATCACAAC CCCCTCAGGA 420 

AAAGGTGCAG TTAAATGTGG AGGGGATCTT ACATTTGATA ACAATGGAAC TATTTTATTT 480 

AAACAAGATT ACTGTGAGGA AAATGGCGGA GCCATTTCTA CCAAGAATCT TTCTTTGAAA 540 

AACAGCACGG GATCGATTTC TTTTGAAGGG AATAAATCGA GCGCAACAGG GAAAAAAGGT 600 

GGGGCTATTT GTGCTACTGG TACTGTAGAT ATTACAAATA ATACGGCTCC TACCCTCTTC 660 

TCGAACAATA TTGCTGAAGC TGCAGGTGGA GCTATAAATA GCACAGGAAA CTGTACAATT 720 

ACAGGGAATA CGTCTCTTGT ATTTTCTGAA AATAGTGTGA CAGCGACCGC AGGAAATGGA 780 

GGAGCTCTTT CTGGAGATGC CGATGTTACC ATATCTGGGA ATCAGAGTGT AACTTTCTCA 840 

GGAAACCAAG CTGTAGCTAA TGGCGGAGCC ATTTATGCTA AGAAGCTTAC ACTGGCTTCC 900 

GGGGGGGGGG GGGGTATCTC CTTTTCTAAC AATATAGTCC AAGGTACCAC TGCAGGTAAT 960 

GGTGGAGCCA TTTCTATACT GGCAGCTGGA GAGTGTAGTC TTTCAGCAGA AGCAGGGGAC 1020 

ATTACCTTCA ATGGGAATGC CATTGTTGCA ACTACACCAC AAACTACAAA AAGAAATTCT 1080 

ATTGA CATAG GATCTACTGC AAAGATCACG AATTTACGTG CAATATCTGG GCATAGCATC 1140 

TTTTTCTACG ATCCGATTAC TGCTAATACG GCTGCGGATT CTACAGATAC TTTAAATCTC 1200 

AATAAGGCTG ATGCAGGTAA TAGTACAGAT TATAGTGGGT CGATTGTTTT TTCTGGTGAA 1260 
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AAGCTCTCTG AAGATGAAGC AAAAGTTGCA GACAACCTCA 
GTAACTCTAA CTGCAGGAAA TTTAGTACTT AAACGTGGTG 
TTTACTCAGA CCGCGGGTTC CTCTGTTATT ATGGATGCGG 
ACAGAGGAGG TCACTTTAAC AGGTCTTTCC ATTCCTGTAG 
AAAGTTGTAA TTGCTGCTTC TGCAGCAAGT AAAAATGTAG 
CTTTTGGATA ACCAAGGGAA TGCTTATGAA AATCACGACT 
TCATTTGTGC AGCTCTCTGC TCTGGGTACT GCAACAACTA 
ACAGTAGCAA CTCCTACGCA CTATGGGTAT CAAGGTACTT 
GATACCGCAA GCACTCCAAA GACTAAGACA GCGACATTAG 
CTTCCGAATC CTGAGCGTCA AGGACCTTTA GTTCCTAATA 
GACATCCAAG CGATTCAAGG TGTCATAGAG AGAAGTGCTT 
GGCTTCTGGG CTGCGGGAGT CGCCAATTTC TTAGATAAAG 
AAATACCGTC ATAAATCTGG TGGATATGCT ATCGGAGGTG 
AACTTAATTA GCTTTGCCTT TTGCCAACTC TTTGGTAGCG 
AAAAATGATA CTGATACCTA TGGAGGAGGC TTCTATATCC 
GGGTTCATAG GTTGTCTCTT AGATAAACTT CCTGGCTCTT 
TTAGAAGGGC AGCTCGCTTA TAGCCACGTC AGTAATGATC 
TATCCTGAGG TGAAAGGTTC TTGGGGGAAT AATGCTTTTA 
TCTCATTCTT ATCCTGAATA CCTGCATTGT TTTGATACCT 
AATCTGACCT ATATACGTCA GGACAGCTTC TCGGAGAAAG 
GATGACAGCA ACCTCTTCAA TTTATCTTTG CCTATAGGGG 
GATTGTAATG ACTTTTCTTA TGATCTGACT TTATCCTATG 
GATCCCAAAT GCACTACAGC ACTTGTAATC AGCGGAGCCT 
AACTTAGCAC GACAGGCCTT GCAAGTGCGT GCAGGCAGTC 
TTTGAAGTGC TCGGCCAGTT TGTCTTTGAA GTTCGTGGAT 
GATCTTGGGG GTAAGTTCCA ATTCTAGGAG CGTCTCTCAT 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 928 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



Met 


Lys 


Ser 


Gin 


Phe 


Ser 


Trp 


Leu 


Val 


Leu 


Ser 


Ser 


Thr 


Leu 


Ala 


Cys 


1 








5 










10 










15 


Phe 


Thr 


Ser 


Cys 


Ser 


Thr 


Val 


Phe 


Ala 


Ala 


Thr 


Ala 


Glu 


Asn 


He 


Gly 








20 










25 










30 




Pro 


Ser 


Asp 


Ser 


Phe 


Asp 


Gly 


Ser 


Thr 


Asn 


Thr 


Gly 


Thr 


Tyr 


Thr 


Pro 






35 










40 










45 






Lys 


Asn 
50 


Thr 


Thr 


Thr 


Gly 


He 
55 


Asp 


Tyr 


Thr 


Leu 


Thr 
60 


Gly 


Asp 


He 


Thr 


Leu 


Gin 


Asn 


Leu 


Gly 


Asp 


Ser 


Ala 


Ala 


Leu 


Thr 


Lys 


Gly 


Cys 


Phe 


Ser 


65 










70 










75 








60 


Asp 


Thr 


Thr 


Glu 


Ser 
85 


Leu 


Ser 


Phe 


Ala 


Gly 
90 


Lys 


Gly 


Tyr 


Ser 


Leu 
95 


Ser 


Phe 


Leu 


Asn 


He 
100 


Lys 


Ser 


Ser 


Ala 


Glu 
105 


Gly 


Ala 


Ala 


Leu 


Ser 
110 


Val 


Thr 


Thr 


Asp 


Lys 
115 


Asn 


Leu 


Ser 


Leu 


Thr 
120 


Gly 


Phe 


Ser 


Ser 


Leu 
125 


Thr 


Phe 


Leu 


Ala 


Ala 
130 


Pro 


Ser 


Ser 


Val 


He 
135 


Thr 


Thr 


Pro 


Ser 


Gly 
140 


Lys 


Gly 


Ala 


Val 
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CTTCTACGCT 


GAAGCAGCCT 


1320 


TCACTCTCGA 


TACGAAAGGC 


1380 


o CACAACGTT 


TV "TV H fTIk tL /^m 

AAAAGCAAGT 


1440 


ACTCTTTAGG 


CGAGGGTAAG 


1500 


CCCTTAGTGG 


TCCGATTCTT 


1560 


TAGGAAAAAC 


TCAAGACTTT 


1620 


CAGATGTTCC 


AGCGGTTCCT 


1680 


GGGGAATGAC 


TTGGGTTGAT 


1740 


CTTGGACCAA 


TACAGGCTAC 


1800 


GCCTTTGGGG 


ATCTTTTTCA 


1860 


TGACTCTTTG 


TTCAGATCGA 


1920 


ATAAGAAAGG 


GGAAAAACGC 


1980 


CAGCGCAAAC 


TTGTTCTGAA 


2040 


ATAAAGATTT 


CTTAGTCGCT 


2100 


AACACATTAC 


AGAATGTAGT 


2160 


GGAGTCATAA 


ACCCCTCGTT 


2220 


TGAAGACAAA 


GTATACTGCG 


2280 


ACOTGATCTT 


GGGAGCTTCT 


2340 


ATGCTCCATA 


CATCAAACTG 


2400 


GTACAGAAGG 


AAGATCTTTT 


2460 


TGAAGTTTGA 


GAAGTTCTCT 


2520 


TTCCTGATCT 


TATCCGCAAT 


2580 


CTTGGGAAAC 


TTATGCCAAT 


2640 


ACTACGCCTT 


CTCTCCTATG 


2700 


CCTCACGGAT 


TTATAATGTA 


2760 


GTCTCAGAAA 


TTCTG 


2815 
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Lys Cys Gly Gly Asp Leu Thr Phe Asp Asn Asn Gly Thr lie Leu Phe 
145 150 155 160 

Lys Gin Asp Tyr Cys Glu Glu Asn Gly Gly Ala lie Ser Thr Lys Asn 

165 170 175 

Leu Ser Leu Lys Asn Ser Thr Gly Ser He Ser Phe Glu Gly Asn Lys 

180 185 190 

Ser Ser Ala Thr Gly Lys Lys Gly Gly Ala He Cys Ala Thr Gly Thr 

195 200 205 

Val Asp He Thr Asn Asn Thr Ala Pro Thr Leu Phe Ser Asn Asn He 

210 215 220 

Ala Glu Ala Ala Gly Gly Ala He Asn Ser Thr Gly Asn Cys Thr He 
225 230 235 240 

Thr Gly Asn Thr Ser Leu Val Phe Ser Glu Asn Ser Val Thr Ala Thr 

245 250 255 

Ala Gly Asn Gly Gly Ala Leu Ser Gly Asp Ala Asp Vai Thr lie Ser 

260 265 270 

Gly Asn Gin Ser Val Thr Phe Ser Gly Asn Gin Ala Val Ala Asn Gly 

275 280 285 

Gly Ala He Tyr Ala Lys Lys Leu Thr Leu Ala Ser Gly Gly Gly Gly 

290 295 300 

Gly He Ser Phe Ser Asn Asn He Val Gin Gly Thr Thr Ala Gly Asn 
305 310 315 320 

Gly Gly Ala He Ser He Leu Ala Ala Gly Glu Cys Ser Leu Ser Ala 

325 330 335 

Glu Ala Gly Asp He Thr Phe Asn Gly Asn Ala He Val Ala Thr Thr 

340 345 350 

Pro Gin Thr Thr Lys Arg Asn Ser He Asp He Gly Ser Thr Ala Lys 

355 360 365 

lie Thr Asn Leu Arg Ala lie Ser Gly His Ser lie Phe Phe Tyr Asp 

370 375 380 

Pro lie Thr Ala Asn Thr Ala Ala Asp Ser Thr Asp Thr Leu Asn Leu 
385 390 395 400 

Asn Lys Ala Asp Ala Gly Asn Ser Thr Asp Tyr Ser Gly Ser lie Val 

405 410 415 

Phe Ser Gly Glu Lys Leu Ser Glu Asp Glu Ala Lys Val Ala Asp Asn 

420 425 430 

Leu Thr Ser Thr Leu Lys Gin Pro Val Thr Leu Thr Ala Gly Asn Leu 

435 440 445 

Val Leu Lys Arg Gly Val Thr Leu Asp Thr Lys Gly Phe Thr Gin Thr 

450 455 460 

Ala Gly Ser Ser Val He Met Asp Ala Gly Thr Thr Leu Lys Ala Ser 
465 470 475 480 

Thr Glu Glu Val Thr Leu Thr Gly Leu Ser lie Pro Val Asp Ser Leu 

485 490 495 

Gly Glu Gly Lys Lys Val Val lie Ala Ala Ser Ala Ala Ser Lys Asn 

500 505 510 

Val Ala Leu Ser Gly Pro He Leu Leu Leu Asp Asn Gin Gly Asn Ala 

515 520 525 

Tyr Glu Asn His Asp Leu Gly Lys Thr Gin Asp Phe Ser Phe Val Gin 

530 535 540 

Leu Ser Ala Leu Gly Thr Ala Thr Thr Thr Asp Val Pro Ala Val Pro 
545 550 555 560 

Thr Val Ala Thr Pro Thr His Tyr Gly Tyr Gin Gly Thr Trp Gly Met 

565 570 575 

Thr Trp Val Asp Asp Thr Ala Ser Thr Pro Lys Thr Lys Thr Ala Thr 

580 585 590 

Leu Ala Trp Thr Asn Thr Gly Tyr Leu Pro Asn Pro Glu Arg Gin Gly 
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595 




600 






OUD 






Pro Lpii Va 1 


P"J"n &cn 


Cot* T.oii Tm 
OCI liCU X ip 


<jxy 


oer rue oer 


Asp 


He Gin 


Ala 


610 




615 










Tip Gin C5lv 
lie uxu \?iy 


Va 1 Tip 
vax lie 


uxu y oci 


Axa 


Leu Thr Leu 


Cys 


Ser Asp Arg 


625 




630 




wj D 






640 


vJ-L y ir lie i .L jj 


Al a Al » 

nla H± Ct 


oxy vax ax a 


Asn 


Phe Leu Asp 


Lys 


Asp Lys 


Lys 




CAR 






CCA 




655 




vj j. y uiu j-jy i> 


A irex T .i.rG 
nxy iiyo 


iyx Arg xix s 


Lys 


oer biy Gly 


Tyr 


Ala He Gly 




660 




ODD 






670 




VjJ. y Ala rVXcl 


OJ.ll 1.1127 


uys oer uxu 


Asn 


Leu lie Ser 


Phe 


Ala Phe 


Cys 


O / D 




con 
oou 






685 




ulU li€U clltz 


\jj.y oci 


Asp Lys Asp 


Pne 


Leu Val Ala 


Lys 


Asn His 


Thr 


v 7 U 




DSD 




700 








ASp 1 IlX 1 yx 


AJ-Cl \y J.y 


Ala Phe Tyr 


lie 


Gin His lie 


Thr 


Glu Cys Ser 


•7 f.s; 
/ U3 




/ 1U 




715 






720 


oiy pne lie 


Gly Cys 


Leu Leu Asp 


Lys 


Leu Pro Gly 


Ser 


Trp Ser His 




/ZD 






730 




735 




jjys pro lieu. 


TT-a 1 T tit •> 

vax Lieu 


CjXu Cjiy Gin 


Leu 


Ala Tyr Ser 


His 


Val Ser 


Asn 








745 






750 




Asp Leu Lys 


Thr Lys 


Tyr Thr Ala 


Tyr 


Pro Glu Val 


Lys 


Gly Ser 


Trp 


/DD 




760 






765 






Gly Asn Asn 


Ala Phe 


Asn Met Met 


Leu 


Gly Ala Ser 


Ser 


His Ser 


Tyr 


/ / V 




/ /b 




780 






Pro Glu Tyr 


Leu His 


Cys Phe Asp 


Thr 


Tyr Ala Pro 


Tyr 


He Lys 


Leu 






•JQA 

/ y u 




795 




800 


Asn Leu Thr 


Tyr lie 


Arg Gin Asp 


Ser 


Phe Ser Glu 


Lys 


Gly Thr 


Glu 




on c 






810 




815 




uiy Arg ber 


Phe Asp 


Asp Ser Asn 


Leu 


Phe Asn Leu 


Ser 


Leu Pro 


He 








825 






830 




uiy val Lys 


Pne Glu 


Lys Phe Ser 


Asp 


Cys Asn Asp 


Phe 


Ser Tyr 


Asp 






840 






845 






jjeu i nr Lieu 


Ser Tyr 


val Pro Asp 


Leu 


lie Arg Asn 


Asp 


Pro Lys 


Cys 


ODU 




occ 
ODD 




860 








inr inr A J. a. 


Lieu vai 


lie Ser Gly 


Ala 


Ser Trp Glu 


Thr 


Tyr Ala 


Asn 


665 




870 




875 






880 


Asn Leu Ala 


Arg Gin 


Ala Leu Gin 


Val 


Arg Ala Gly 


Ser 


His Tyr 


Ala 




885 






890 




895 




Phe Ser Pro 


Met Phe 


Glu Val Leu 


Gly 


Gin Phe Val 


Phe 


Glu Val 


Arg 




900 




905 






910 


Gly Ser Ser 


Arg lie 


Tyr Asn Val 


Asp 


Leu Gly Gly 


Lys 


Phe Gin 


Phe 


915 




920 






925 







(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3052 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATGCGATTTT CGCTCTGCGG ATTTCCTCTA GTTTTTTCTT TAACATTGCT CTCAGTCTTC 60 
GACACTTCTT TGAGTGCTAC TACGATTTCT TTAACCCCAG AAGATAGTTT TCATGGAGAT 120 
AGTCAGAATG CAGAACGTTC TTATAATGTT CAAGCTGGGG ATGTCTATAG CCTTACTGGT 180 
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GATGTCTCAA TATCTAACGT CGATAACTCT GCATTAAATA AAGCCTGCTT CAATGTGACC 240 

TCAGGAAGTG TGACGTTCGC AGGAAATCAT CATGGGTTAT ATTTTAATAA TATTTCCTCA 300 

GGAACTACAA AGGAAGGGGC TGTACTTTGT TGCCAAGATC CTCAAGCAAC GGCACGTTTT 360 

TCTGGGTTCT CCACGCTCTC TTTTATTCAG AGCCCCGGAG ATATTAAAGA ACAGGGATGT 420 

CTCTATTCAA AAAATGCACT TATGCTCTTA AACAATTATG TAGTGCGTTT TGAACAAAAC 480 

CAAAGTAAGA CTAAAGGCGG AGCTATTAGT GGGGCGAATG TTACTATAGT AGGCAACTAC 540 

GATTCCGTCT CTTTCTATCA GAATGCAGCC ACTTTTGGAG GTGCTATCCA TTCTTCAGGT 600 

CCCCTACAGA TTGCAGTAAA TCAGGCAGAG ATAAGATTTG CACAAAATAC TGCCAAGAAT 660 

GGTTCTGGAG GGGCTTTGTA CTCCGATGGT GATATTGATA TTGATCAGAA TGCTTATGTT 720 

CTATTTCGAG AAAATGAGGC ATTGACTACT GCTATAGGTA AGGGAGGGGC TGTCTGTTGT 780 

CTTCCCACTT CAGGAAGTAG TACTCCAGTT CCTATTGTGA CTTTCTCTGA CAATAAACAG 840 

TTAGTCTTTG AAAGAAACCA TTCCATAATG GGTGGCGGAG CCATTTATGC TAGGAAACTT 900 

AGCATCTCTT CAGGAGGTCC TACTCTATTT ATCAATAATA TATCATATGC AAATTCGCAA 960 

AATTTAGGTG GAGCTATTGC CATTGATACT GGAGGGGAGA TCAGTTTATC AGCAGAGAAA 1020 

GGAAGAATTA CATTGGAAGG AAAGGGGACG AGCTTACCGT TTTTGAATGG CATCCATCTT 1080 

TTACAAAATG CTAAATTCCT GAAATTACAG GCGAGAAATG GATGCTCTAT AGAATTTTAT 1140 

GATCCTATTA CTTCTGAAGC AGATGGGTCT ACCCAATTGA ATATCAACGG AGATCCTAAA 1200 

AATA AAGAG T ACACAGGGAC CATACTCTTT TCTGGAGAAA AGAGTCTAGC AAACGATCCT 1260 

AGGGATTTTA AATCTACAAT CCCTCAGAAC GTCAACCTGT CTGCAGGATA CTTAGTTATT 1320 

AAAGAGGGGG CCGAAGTCAC AGTTTCAAAA TTCACGCAGT CTCCAGGATC GCATTTAGTT 1380 

TTAGATTTAG GAACCAAACT GATAGCCTCT AAGGAAGACA TTGCCATCAC AGGCCTCGCG 1440 

ATAGATATAG ATAGCTTAAG CTCATCCTCA ACAGCAGCTG TTATTAAAGC AAACACCGCA 1500 

AATAAACAGA TATCCGTGAC GGACTCTATA GAACTTATCT CGCCTACTGG CAATGCCTAT 1560 

GAAGATCTCA GAATGAGAAA TTCACAGACG TTCCCTCTGC TCTCTTTAGA GCCTGGAGCC 1620 

GGGGGTAGTG TGACTGTAAC TGCTGGAGAT TTCCTACCGG TAAGTCCCCA TTATGGTTTT 1680 

CAAGGCAATT GGAAATTAGC TTGGACAGGA ACTGGAAACA AAGTTGGAGA ATTCTTCTGG 1740 

GATAAAATAA ATTATAAGCC TAGACCTGAA AAAGAAGGAA ATTTAGTTCC TAATATCTTG 1800 

TGGGGGAATG CTGTAAATGT CAGATCCTTA ATGCAGGTTC AAGAGACCCA TGCATCGAGC 1860 

TTACAGACAG ATCGAGGGCT GTGGATCGAT GGAATTGGGA ATTTCTTCCA TGTATCTGCC 1920 

TCCGAAGACA ATATAAGGTA CCGTCATAAC AGCGGTGGAT ATGTTCTATC TGTAAATAAT 1980 

GAGATCACAC CTAAGCACTA TACTTCGATG GCATTTTCCC AACTCTTTAG TAGAGACAAG 2040 

GACTATGCGG TTTCCAACAA CGAATACAGA ATGTATTTAG GATCGTATCT CTATCAATAT 2100 

ACAACCTCCC TAGGGAATAT TTTCCGTTAT GCTTCGCGTA ACCCTAATGT AAACGTCGGG 2160 

ATTCTCTCAA GAAGGTTTCT TCAAAATCCT CTTATGATTT TTCATTTTTT GTGTGCTTAT 2220 

GGTCATGCCA CCAATGATAT GAAAACAGAC TACGCAAATT TCCCTATGGT GAAAAACAGC 2280 

TGGAGAAACA ATTGTTGGGC TATAGAGTGC GGAGGGAGCA TGCCTCTATT GGTATTTGAG 2340 

AACGGAAGAC TTTTCCAAGG TGCCATCCCA TTTATGAAAC TACAATTAGT TTATGCTTAT 2400 

CAGGGAGATT TCAAAGAGAC GACTGCAGAT GGCCGTAGAT TTAGTAATGG GAGTTTAACA 2460 

TCGATTTCTG TACCTCTAGG CATACGCTTT GAGAAGCTGG CACTTTCTCA GGATGTACTC 2520 

TATGACTTTA GTTTCTCCTA TATTCCTGAT ATTTTCCGTA AGGATCCCTC ATGTGAAGCT 2580 

GCTCTGGTGA TTAGCGGAGA CTCCTGGCTT GTTCCGGCAG CACACGTATC AAGACATGCT 2640 

TTTGTAGGGA GTGGAACGGG TCGGTATCAC TTTAACGACT ATACTGAGCT CTTATGTCGA 2700 

GGAAGTATAG AATGCCGCCC CCATGCTAGG AATTATAATA TAAACTGTGG AAGCAAATTT 2760 

CGTTTTTAGA AGGTTTCCAT TGCCTGTGTG GTTCCGGATC TTAACTATAA ATCCTGGACT 2820 

ATGGATCATA GGCATTGGGT TTCTCGAACT TGTGTGGAGA ATAACGACAT TTTATATGCA 2880 

TAACGGAATA CTCGTATCAC CTCAGCCCCT AGAGACATTC TTTAGGGGTT CTTTATTTGT 2940 

CTAAACTTCG TATTTTATCG AGAATCCTTT ACGTTCTTGG TTTGCTTGTC TCCGAGGAGT 3000 

TCTCTAACGA ATCATAGGGA TTCCAGGGTT CTGTTCCTTG AGTCCTTTGG CA 3052 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 922 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Met 


Arg 


Phe Ser Leu 


Cys Gly 


Phe 


Pro 


Leu 


Val PVio Co t~ T.ph 
vax riic ocx Xjcu 


ixix ueu 


1 




5 








10 




X 3 


Leu 


Ser 


Val Phe Asp 


Thr Ser 


Leu 


Ser 


Ala 


X I1X i xix lie OCX 


iieu inr 






20 






25 




JU 




Pro 


Glu 


Asp Ser Phe 


His Glv 




OCX 


uin 


Asn Ala Glu Arg 


Ser Tyr 






35 




40 








Asn 


Val 


Gin Ala Gly 


flop v ax 


iyr 


Co-r- 


Leu 


inr Gly Asp val 


Ser lie 




50 




55 








bU 




Ser 


Asn 


Val Asti A<?n 


Ser Ala 


Leu 


Asn 


Lys 


Ala Cys Phe Asn 


Val Thr 


65 






70 








"7 C 

to 


80 


Ser- 


Glv 


Ser -Val Thr- 

«C1 V GLJ. X XXX 


PV;o ZX 1 = 

XT IXC iA-L a 


*jx y 


/itjii 


Hxs 


His Gly Leu Tyr 


Phe Asn 














90 




95 


Asn 


He 


R*»T" Cot* \r 
ocx. ocx> OXy 


i xir i nr 


.uys 


GlU 


Gly 


Ala Val Leu Cys 


Cys Gin 






i ff o 






105 




110 




Asp 


Prn 


ijlxx Ala 1HX7 


Ala Arg 


Pne 


Ser 


Gly 


Phe Ser Thr Leu 


Ser Phe 






115 




i n 
iz u 






125 




He 


Gin 


.Q#^r* Dm fllv 
ocx. riu vxX y 


Asp lie 


Lys 


GlU 


Gin 


Gly Cys Leu Tyr 


Ser Lys 




130 




X J D 








140 


Asn 


Ala 


Ti^ll Mof* T.on 
xjcu net xicu 


T .OH A C^T^ 


Asn 


Tyr 


val 


Val Arg Phe Glu 


Gin Asn 


145 






150 








ICC 

15d 


160 


Gin 


Ser 


T iVQ TVi t* T A/a 
u y o x in jjy o 




/via 


lie 


Ser 


Gly Ala Asn Val 


Thr He 














170 




175 


Val 


Glv 

uxy 


Aqti Tvr Hen 

noil l yi noU 


ocx Veil 


Oo-r> 

oer 


riie 


Tyr 


Gin Asn Ala Ala 


Thr Phe 






180 






IOC 

Iod 




190 




Glv 


Glv 


<nx.a xxc nxo 


Cor Cov* 
OCX OCX 


uiy 


Pro 


Leu 


Gin He Ala Val 


Asn Gin 






195 




zu u 






205 




Ala 


Glu 


Tip A T*fT DVia 

x x c iAX. y ir lie 


Ala IjIH 


Asn 


Thr 


Ala 


Lys Asn Gly Ser 


Gly Gly 




210 




Z ID 








220 


Ala 


Leu 


Tvr Ser A*m 


fll v Aon 
vixy rU>Ly 


lie 


Asp 


Tin 

lie 


Asp Gin Asn Ala 


Tyr Val 


225 






230 








*> T c 

z 


240 


Leu 


Phe 


Ara Glu Asn 


Glu Ala 

VJ X LI AX CA 


Leu 


inr 


inr 


Ala lie Gly Lys 


Gly Gly 






245 








ZDU 




*"i c c 

255 


Ala 


Val 


Cvs Cvs Leu 


Pro Thr 


Cot- 
OCX 


vjly 


oer 


ber inr pro val 


Pro He 






260 






o fit; 
zoo 




270 




Val 


Thr 


Phe Ser Asn 


&cn T.vrc 
noil xj y o 




Leu 


vai 


Phe Glu Arg Asn 


His Ser 






275 










ZOD 




He 


Met 


Glv Glv Glv 


Ala He 


iyr 


Ala 


Arg 


Lys Leu Ser He 


Ser Ser 




290 




295 








jUU 




Gly 


Gly 


Pro Thr Leu 


Phe He 


Asn 


Asn 


Tip 

xxc 


Cay r Ptrr" TV 1 -i Ann 

ocx iyr ina Asn 


Ser Gin 


305 






310 








315 


■5 o ri 
JzU 


Asn 


Leu 


Gly Gly Ala 


He Ala 


He 


Asp 


Thr 


vjiy uiy uiu ne 


Ser Leu 






325 








330 




j 0 j 


Ser 


Ala 


Glu Lys Gly 


Thr He 


Thr 


Phe 


Gin 


Gly Asn Arg Thr 


Ser Leu 






340 






345 




350 




Pro 


Phe 


Leu Asn Gly 


He His 


Leu 


Leu 


Gin 


Asn Ala Lys Phe 


Leu Lys 






355 




360 






365 


Leu 


Gin 


Ala Arg Asn 


Gly Cys 


Ser 


He 


Glu 


Phe Tyr Asp Pro 


He Thr 




370 




375 








380 




Ser 


Glu 


Ala Asp Gly 


Ser Thr 


Gin 


Leu 


Asn 


He Asn Gly Asp 


Pro Lys 


385 






390 








395 


400 


Asn 


Lys 


Glu Tyr Thr 


Gly Thr 


He 


Leu 


Phe 


Ser Gly Glu Lys 


Ser Leu 


Ala 




405 








410 


415 


Asn 


Asp Pro Arg 


Asp Phe 


Lys 


Ser 


Thr 


lie Pro Gin Asn 


Val Asn 
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420 










425 






430 


Leu 


Ser 


Ala 


Gly 


Tyr 


Leu 


Val 


lie 


Lvs 


Glu 


Gly Ala 


Glu Val Thr Val 

\J X U V CI X i. ixx veil 






435 










440 








445 


Ser 


Lys 


Phe 


Thr 


Gin 


Ser 


Pro 


Glv 


Ser 


His 


Leu Val 


T .pi i Aon T ,c>n f2l v 
iiCli rXSiyt IjCU ' uiy 




450 










455 








460 


Thr 


Lys 


Leu 


He 


Ala 


Ser 


Lvs 


Glu 


Asp 


He 


Ala He 


xixx uiy xjcU AXa 


465 










470 










475 


480 


He 


Asp 


lie 


ASD 


Ser 


Leu 


Ser 


Ser 


Ser 




xixx /VI cl 


A 1 -3 "If-. 1 Tl » T « m 

nia vai ne i*ys 










485 










490 






Ala 


Asn 


Thr 


Ala 


Asn 


Lvs 


Gin 


He 


Ser 


Val 


TVit* A car* 
X 111 r\i> Ly 


Cor Tie* fl »i T 

ot=x lie uiu Lieu 








500 










505 






510 


He 


Ser 


Pro 


Thr 


Glv 


Asn 


Ala 


Tvr 


Glu 


Hen 
AO Li) 




imc u Arg Asn o er 






515 










520 










Gin 


Thr 


Phe 


Pro 


Leu 


Leu 


Ser 


Leu 


Glu 

VJXU 


xrivj 


ul y Ala 


ij±y vjJ-y oer vai 




530 










.535 










Thr 


Val 


Thr 


Ala 


Glv 


Asp 


Phe 


Leu 


*r X KJ 


Val 


Ser Pro 


His Tyr Gly Phe 


545 










550 












560 


Gin 


Glv 


Asn 


Tro 


Lys 


Leu 


Ala 


_ ... 
Trp 


i nr 


oly 


mr my 


Asn Lys Val Gly 










565 














575 


Glu 


Phe 


Phe 


Tro 


Asp 


Lys 


lie 


Asn 


lyr 


Lys 


r*x o -f^rg 


Pro Glu Lys Glu 








580 










585 








Glv 


Asn 


Leu 


Val 


Pro 


Asn 


lie 




xrp 


ijiy 


Asn Aia 


vai Asn Val Arg 






595 










600 








DUD 


Ser 


Leu 


Met 


Gin 


Val 


Gin 


Glu 


Xilx 


n i a 


Ala 


Ser Ser 


Leu Gin Thr Asp 




610 










615 










Atq 


Glv 


Leu 


Tro 


He 


Asp 


Gly 


x xts 


uiy 


Asn 


jt*ne fne 


His Vai Ser Ala 


625 










630 










Djj 


640 


Ser 


Glu 


Asp 


Asn 


He 


Arg 


1 Y L 


Arg 


MIS 


Asn 


Ser Gly 


Gly Tyr Val Leu 










645 










QjU 




655 


Ser 


Val 


Asn 


Asn 


Glu 


lie 


Thr 


it X u 


Lys 


ills 


i yx inr 


ber Met Ala Phe 








660 










665 






C 1 f\ 

b /U 


Ser 


Gin 


Leu 


Phe 


Ser 


Arg 


Asp 


Lys 


Asp 


Tyr 


Aia val 


Ser Asn Asn Glu 






675 










680 








D OO 


Tyr 


Atq 


Met 


Tvr 


Leu 


Glv 


Ser 


xyi 


Leu 


Tyr 


Gin Tyr 


i nr i nr ser Leu 




690 










695 








/ uu 




Gly 


Asn 


He 


Phe 


Aro 


Tvr 


Ala 


Ser 


Arg 


Asn 


rXO ASIl 


vai Asn val Gly 


705 










710 










71 5 


720 


He 


Leu 


Ser 


Arcr 


Arcr 


Phe 


Leu 


gi n 


Asn 


Pro 


Leu Met 


He Pne His Phe 










725 










730 




/ JD 


Leu 


Cys 


Ala 


Tyr 


Glv 


His 


Ala 


Thr 


Asn 


A en 

nop 


Mat" T 


inr Asp lyx Ala 








740 










745 








Asn 


Phe 


Pro 


Met 


Val 


Lvs 


Asn 


Ser 


Tm 


A TVT 

Arg 


nbll noil 


wys lrp Aia lie 






755 










760 








765 


Glu 


Cys 


Gly 


Gly 
i 


Ser 


Met 


Pro 


Leu 


Leu 


Val 


tr lie ulll 


Hsn ijiy Arg lieu 




770 










775 








780 


Phe 


Gin 


Gly 


Ala 


He 


Pro 


Phe 


Met 


Lys 


Leu 


Gin T .<=»i ■» 

VJJX1.1 iiCU 


l/al THrr* Ala 1Sn> 

Val i y x Aia lyT 


785 










790 










795 


OUU 


Gin 


Gly 


Asp 


Phe 


Lvs 


Glu 


Thr 


Thr 


Ala 




Gl \/ fir/v 

uiy Mxg 


Arg fne oer Asn 










805 










810 




OlO 


Gly 


Ser 


Leu 


Thr 


Ser 


He 


Ser 


Val 


Pro 


Leu 


Gly He 


Arcr PVlf^ Gin T.vra 








820 










825 






830 


Leu 


Ala 


Leu 


Ser 


Gin 


Asp 


Val 


Leu 


Tyr 


Asp 


Phe Ser 


Phe Ser Tyr He 






835 










840 








845 


Pro 


Asp 


He 


Phe 


Arg 


Lys 


Asp 


Pro 


Ser 


Cys 


Glu Ala 


Ala Leu Val He 




850 










855 








860 




Ser 


Gly 


Asp 


Ser 


Trp 


Leu 


Val 


Pro 


Ala 


Ala 


His Val 


Ser Arg His Ala 


865 










870 










875 


880 
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Phe Val Gly Ser Gly Thr Gly Arg Tyr His Phe Asn Asp Tyr Thr Glu 

885 890 895 

Leu Leu Cys Arg Gly Ser lie Glu Cys Arg Pro His Ala Arg Asn Tyr 

900 905 910 

Asn lie Asn Cys Gly Ser Lys Phe Arg Phe 
915 920 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2526 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATGAAGATTC CACTCCGCTT TTTATTGATA TCATTAGTAC CTACGCTTTC TATGTCGAAT 60 

TTATTAGGAG C TGCT ACTAC CGAAGAGCTA TCGGCTAGCA ATAGCTTCGA TGGAACTACA 120 

TCAACAACAA GCTTTTCTAG TAAAACATCA TCGGCTACAG ATGGCACCAA TTATGTTTTT 180 

AAAGATTCTG TAGTTATAGA AAATGTACCC AAAACAGGGG AAACTCAGTC TACTAGTTGT 240 

TTTAAAAATG ACGCTGCAGC TGGAGATCTA AATTTCTTAG GAGGGGGATT TTCTTTCACA 300 

TTTAGCAATA TCGATGCAAC CACGGCTTCT GGAGCTGCTA TTGGAAGTGA AGCAGCTAAT 360 

AAGACAGTCA CGTTATCAGG ATTTTCGGCA CTTTCTTTTC TTAAATCCCC AGCAAGTACA 420 

GTGACTAATG GATTGGGAGC TATCAATGTT AAAGGGAATT TAAGCCTATT GGATAATGAT 480 

AAGGTATTGA TTCAGGACAA TTTCTCAACA GGAGATGGCG GAGCAATTAA TTGTGCAGGC 540 

TCCTTGAAGA TCGCAAACAA TAAGTCCCTT TCTTTTATTG GAAATAGTTC TTCAACACGT 600 

GGCGGAGCGA TTCATACCAA AAACCTCACA CTATCTTCTG GTGGGGAAAC TCTATTTCAG 660 

GGGAATACAG CGCCTACGGC TGCTGGTAAA GGAGGTGCTA TCGCGATTGC AGACTCTGGC 720 

ACCCTATCCA TTTCTGGAGA CAGTGGCGAC ATTATCTTTG AAGGCAATAC GATAGGAGCT 780 

ACAGGAACCG TCTCTCATAG TGCTATTGAT TTAGGAACTA GCGCTAAGAT AACTGCGTTA 840 

CGTGCTGCGC AAGGACATAC GATATACTTT TATGATCCGA TTACTGTAAC AGGATCGACA 900 

TCTGTTGCTG A TGCTC TCAA TATTAATAGC CCTGATACTG GAGATAACAA AGAGTATACG 960 

GGAACCATAG TCTTTTCTGG AGAGAAGCTC ACGGAGGCAG AAGCTAAAGA TGAGAAGAAC 1020 

CGCACTTCTA AATTACTTCA AAATGTTGCT TTTAAAAATG GGACTGTAGT TTTAAAAGGT 1080 

GATGTCGTTT TAAGTGCGAA CGGTTTCTCT CAGGATGCAA ACTCTAAGTT GATTATGGAT 1140 

TTAGGGACGT CGTTGGTTGC AAACACCGAA AGTATCGAGT TAACGAATTT GGAAATTAAT 1200 

ATAGACTCTC TCAGGAACGG GAAAAAGATA AAACTCAGTG CTGCCACAGC TCAGAAAGAT 1260 

ATTCGT ATAG ATCGTCCTGT TGTACTGGCA ATTAGCGATG AGAGTTTTTA TCAAAATGGC 1320 

TTTTTGAATG AGGACCATTC CTATGATGGG ATTCTTGAGT TAGATGCTGG GAAAGACATC 1380 

GTGATTTCTG CAGATTCTCG CAGTATAAAT GCTGTACAAT CTCCGTATGG CTATCAGGGA 1440 

AAGTGGACAA TCAATTGGTC TACTGATGAT AAGAAAGCTA CGGTTTCTTG GGCAAAGCAA 1500 

AGTTTTAATC CCACTGCTGA GCAGGAGGCT CCGTTAGTTC CTAATCTTCT TTGGGGTTCT 1560 

TTTATAGATG TTCGTCCCTT CCAAAATTTT AT AG AG CT AG GTACTGAAGG TGCTCCTTAC 1620 

GAAAAGAGAT TTTGGGTTGC AGGCATTTCC AATGTTTTGC ATAGGAGCGG TCGTGAAAAT 1680 

CAAAGGAAAT TCCGTCATGT GAGTGGAGGT GCTGTAGTAG GTGCTAGCAC GAGGATGCCG 1740 

GGTGGTGATA CCTTGTCTCT GGGTTTTGCT CAGCTCTTTG CGCGTGACAA AGACTACTTT 1800 

ATGAATACCA ATTTCGCAAA GACCTACGCA GGATCTTTAC GTTTGCAGCA CGATGCTTCC 1860 

CTATACTCTG TGGTGAGTAT CCTTTTAGGA GAGGGAGGAC TCCGCGAGAT CCTGTTGCCT 1920 

TATGTTTCCA AGACTCTGCC GTGCTCTTTC TATGGGCAGC TTAGCTACGG CCATACGGAT 1980 

CATCGCATGA AGACCGAGTC TCTACCCCCC CCCCCCCCGA CGCTCTCGAC GGATCATACT 2040 

TCTTGGGGAG GATATGTCTG GGCTGGAGAG CTGGGAACTC GAGTTGCTGT TGAAAATACC 2100 

AGCGGCAGAG GATTTTTCCG AGAGTACACT CCATTTGTAA AAGTCCAAGC TGTTTACTCG 2160 

CGCCAAGATA GCTTTGTTGA ACTAGGAGCT ATCAGTCGTG ATTTTAGTGA TTCGCATCTT 2220 

TATAACCTTG CGATTCCTCT TGGAATCAAG TTAGAGAAAC GGTTTGCAGA GCAATATTAT 2280 
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CAT GTTGT AG CGATGTATTC TCCAGATGTT TGTCGTAGTA ACCCCAAATG TACGACTACC 
CTACTTTCCA ACCAAGGGAG TTGGAAGACC AAAGGTTCGA ACTTAGCAAG ACAGGCTGGT 
ATTGTTCAGG CCTCAGGTTT TCGATCTTTG GGAGCTGCAG CAGAGCTTTT CGGGAACTTT 
GGCT TTGAAT GGCGGGGATC TTCTCGTAGC TATAATGTAG ATGCGGGTAG CAAAATCAAA 
TTTTAG 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 841 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

M S \ MOT XTTTT P HVTID . ~ i- J J _ 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 8: 



2340 
2400 
2460 
2520 
2526 



Met 


Lys 


He 


Pro 


Leu 


Arg 


Phe 


Leu 


Leu 


He 


Ser Leu Val Pro Thr Leu 


1 








5 










10 


15 


Ser 


Met 


Ser 


Asn 


Leu 


Leu 


Gly 


Ala 


Ala 


Thr 


Thr Glu Glu Leu Ser Ala 








20 










25 




30 


Ser 


Asn 


Ser 


Phe 


Asp 


Gly 


Thr 


Thr 


Ser 


Thr 


Thr Ser Phe Ser Ser Lys 






35 










40 






45 


Thr 


Ser 


Ser 


Ala 


Thr 


Asp 


Gly 


Thr 


Asn 


Tyr 


Val Phe Lys Asp Ser Val 




en 
bU 










55 








60 


vai 


Ti- 
ne 


Glu 


Asn 


Val 


Pro 


Lys 


Thr 


Gly 


Glu 


Thr Gin Ser Thr Ser Cys 


bb 










70 










75 80 


fne 


Lys 


Asn 


ASp 


Ala 


Ala 


Ala 


Gly 


Asp 


Leu 


Asn Phe Leu Gly Gly Gly 










85 










90 


95 


4*ne 


O A V 


Pne 


Tnr 


Phe 


Ser 


Asn 


He 


Asp 


Ala 


Thr Thr Ala Ser Gly Ala 








100 










IUj 




110 


Ala 


He 


Gly 


Ser 


Glu 


Ala 


Ala 


Asn 


Lys 


Thr 


Val Thr Leu Ser Gly Phe 






115 










120 






125 


Ser 


Ala 


Leu 


Ser 


Phe 


Leu 


Lys 


Ser 


Pro 


Ala 


Ser Thr Val Thr Asn Gly 




130 










135 








140 


Leu 


Gly 


Ala 


He 


Asn 


Val 


Lys 


Gly 


Asn 


Leu 


Ser Leu Leu Asp Asn Asp 


145 










150 










155 160 


Lys 


Val 


Leu 


He 


Gin 


Asp 


Asn 


Phe 


Ser 


Thr 


Gly Asp Gly Gly Ala He 










165 










170 


175 


Asn 


Cys 


Ala 


Gly 


Ser 


Leu 


Lys 


He 


Ala 


Asn 


Asn Lys Ser Leu Ser Phe 








180 










185 




190 


He 


Gly 


Asn 


Ser 


Ser 


Ser 


Thr 


Arg 


Gly 


Gly 


Ala He His Thr Lys Asn 






195 










200 






205 


Leu 


Thr 


Leu 


Ser 


Ser 


Gly 


Gly 


Glu 


Thr 


Leu 


Phe Gin Gly Asn Thr Ala 




210 










215 








220 


Pro 


Thr 


Ala 


Ala 


Gly 


Lys 


Gly 


Gly 


Ala 


He 


Ala lie Ala Asp Ser Gly 


225 










230 










235 240 


Thr 


Leu 


Ser 


He 


Ser 


Gly 


Asp 


Ser 


Gly 


Asp 


He He Phe Glu Gly Asn 










245 










250 


255 


Thr 


He 


Gly 


Ala 


Thr 


Gly 


Thr 


Val 


Ser 


His 


Ser Ala He Asp Leu Gly 








260 










265 




270 


Thr 


Ser 


Ala 


Lys 


He 


Thr 


Ala 


Leu 


Arg 


Ala 


Ala Gin Gly His Thr He 






275 










280 






285 


Tyr 


Phe 


Tyr 


Asp 


Pro 


He 


Thr 


Val 


Thr 


Gly 


Ser Thr Ser Val Ala Asp 




290 










295 








300 


Ala 


Leu 


Asn 


He 


Asn 


Ser 


Pro 


Asp 


Thr 


Gly 


Asp Asn Lys Glu Tyr Thr 
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305 






310 






Gly Thr 


lie 


v cix trllc 


Del 


uiy 


ulu 






J ^ j 










Lys 


nsu Arg 


inr 


Ser 


Lys 






340 








noil uiy 


X 111. 


Val Val 


Leu 


Lys 


\3±y 




355 








Of ft 

JOU 


Dho Coy* 
It lie OCI 


m n 
uin 


ASp Ala 


Asn 


Ser 


Lys 


370 








J /O 




Lieu Vai 


A "1 -\ 

Ala. 


Asn inr 


ulu 


ber 


lie 


J O -J 












lie Asp 


Ser 


Leu Arg 


Asn 


Gly 


Lys 






A A C 

4U5 








Ala Lrin 


Lys 


Asp He 


Arg 


He 


Asp 






Aon 








Asp Glu 


Ser 


Phe Tyr 


Gin 


Asn 


Gly 




4 jj 








440 


Asp Gly 


Tl A 

lie 


Leu Glu 


Leu 


Asp 


Ala 


*k Z>\) 








455 




Asp Ser 


Arg 


Ser He 


Asn 


Ala 


Val 


SOD 






470 






Lys Trp 


Thr 


He Asn 


Trp 


Ser 


Thr 






485 








lrp Ala 


Lys 


Gin Ser 


Phe 


Asn 


Pro 






enn 
5UU 








Val Pro 


Asn 


Leu Leu 


Trp 


Gly 


Ser 




QIC 
515 








520 


Asn Phe 


lie 


Glu Leu 


Gly 


Thr 


Glu 


Rift 








CO c 

5J5 




r P k WN 17—4 1 

rrp vai 


Ala 


Gly He 


Ser 


Asn 


Val 


c/ c 






CCA 

550 






oin Arg 


Lys 


Phe Arg 


His 


Val 


Ser 






c c c 
565 








inr Arg 


Met 


Pro Gly 


Gly 


Asp 


Thr 






5oU 








rile Al a 


Arg 


Asp Lys 


Asp 


Tyr 


Phe 




CQC 








c n n 


iyr Aia 


oiy 


Ser Leu 


Arg 


Leu 


Gin 


OlU 








615 






lie 


Leu Leu 


Gly 


Glu 


Gly 


0 6 J 












ryr vai 


Ser 


Lys Thr 


Leu 


Pro 


Cys 






£ A C 

645 








m *r Uia 

uiy HIS 


inr 


ASp HIS 


Arg 


Met 


Lys 






a fin 

DDU 








riO 1 Hi 


Leu. 


L>er inr 


Asp 


HIS 


Tnr 












can 


vj-L y uiu 


Leu 


uiy inr 


Arg 


vai 


Ala 


690 








roc 
O j j 




file fne 


Arg 


(jlU IyX 


rpl_ 

inr 


Pro 


Phe 


705 






710 






Arg Gin 


Asp 


Ser Phe 


Val 


Glu 


Leu 






725 








Asp Ser 


His 


Leu Tyr 


Asn 


Leu 


Ala 






740 








Lys Arg 


Phe 


Ala Glu 


Gin 


Tyr 


Tyr 




755 








760 



51 

315 320 
Lys Leu Thr Glu Ala Glu Ala Lys 

330 335 
Leu Leu Gin Asn Val Ala Phe Lys 
345 350 



Asp 


Val 


Val Leu Ser 


Ala Asn Gly 






365 




Leu 


He 


Met Asp Leu 


Gly Thr Ser 






380 




Glu 


Leu 


Thr Asn Leu 


Glu He Asn 






395 


400 


Lys 


He 


Lys Leu Ser 


Ala Ala Thr 




410 




415 


Arg 


Pro 


Val Val Leu 


Ala He Ser 


425 






430 


Phe 


Leu 


Asn Glu Asp 


His Ser Tyr 






445 




Gly 


Lys 


Asp lie Val 


He Ser Ala 






460 




Gin 


Ser 


Pro Tyr Gly 


Tyr Gin Gly 






475 


480 


Asp 


Asp 


Lys Lys Ala 


Thr Val Ser 




490 




495 


Thr 


Ala 


Glu Gin Glu 


Ala Pro Leu 


505 






510 


Phe 


He 


Asp Val Arg 


Pro Phe Gin 






525 




Gly 


Ala 


Pro Tyr Glu 


Lys Arg Phe 






540 




Leu 


His 


Arg Ser Gly 


Arg Glu Asn 






555 


560 


Gly 


Gly 


Ala Val Val 


Gly Ala Ser 




570 




575 


Leu 


Ser 


Leu Gly Phe 


Ala Gin Leu 


585 






590 


Met 


Asn 


Thr Asn Phe 


Ala Lys Thr 






605 




His 


Asp 


Ala Ser Leu 


Tyr Ser Val 






620 




Gly 


Leu 


Arg Glu He 


Leu Leu Pro 






635 


640 


Ser 


Phe 


Tyr Gly Gin 


Leu Ser Tyr 




650 




655 


Thr 


Glu 


Ser Leu Pro 


Pro Pro Pro 


665 






670 


Ser 


Trp 


Gly Gly Tyr 


Val Trp Ala 






685 




Val 


Glu 


Asn Thr Ser 


Gly Arg Gly 






700 




Val 


Lys 


Val Gin Ala 


Val Tyr Ser 






715 


720 


Gly 


Ala 


He Ser Arg 


Asp Phe Ser 




730 




735 


He 


Pro 


Leu Gly He 


Lys Leu Glu 


745 






750 


His 


Val 


Val Ala Met 


Tyr Ser Pro 






765 
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Asp 


Val 


Cys 


Arg 


Ser Asn 


Pro 


Lys Cys 


Thr Thr Thr T.^n T on Car 

AUi J. 111. UCU JjcU OCX 


Asn 




770 








775 




/ ou 




Gin 


Gly 


Ser 


Trp 


Lys Thr 


Lys 


Gly Ser 


/usii Lieu Aid Arg bin Ai_a 


Gly 


785 








790 






795 


800 


He 


Val 


Gin 


Ala 


Ser Gly 


Phe 


Arg Ser 


Leu Gly Ala Ala Ala Glu 


Leu 










805 






810 815 




Phe 


Gly 


Asn 


Phe 


Gly Phe 


Glu 


Trp Arg 


Gly Ser Ser Arg Ser Tyr 


Asn 








820 






825 


830 




Val 


Asp 


Ala 


Gly 


Ser Lys 


He 


Lys Phe 










835 








840 







(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : .2 787 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



ATGAAGTCTT CTTTCCCCAA GTTTGTATTT 
ATTGCTACCG AGACAGTTTT GGATTCAAGT 
TTTTCAGTTC GTGAGAGTCA GGAAGATGCT 
ACTCTAGAAA ATATTCCTGG AACAGGCACA 
AAGGGCGATT TGACTTTCAC AGGTAACGGG 
GGGACTGTAG CAGGGGCTGC TGTTAACAGC 
GGGTTTTCTT CGCTATCTTT TATTGCGTCT 
GCCGTTAGCT GCTCTACGGG TAGCTTGAAG 
AAAAACTTTT CAACGGATAA TGGCGGTGCT 
ACTACAATGT CAGCTCTGTT TTCTGAAAAT 
ACTTCCGATG CCCTTACCAT TACTGGAAAC 
TCTTCGGATT CTGGAGCTGC AATTTTTACA 
AAAGTTTCCT TTATTGACAA TAAGGTCACA 
TCAGGAGGTG CTATCTGTGC TTATAAAACT 
AATCAGATGT TACTCTTCAG CAACAATACA 
AAAAAGCTCG AACTGGCTTC CGGAGGACTT 
GGTACAGCTC CTAAAGGTGG AGCCATAGCT 
GCCGATAGTG GTGACATTGT CTTTTTAGGG 
AATAGAAGTA GTATCGACTT AGGAACGAGT 
GGTAGAGCCA TCTACTTCTA TGATCCCATA 
GTCTTAAAAG TTAATGAGAC TCCGGCAGAT 
TTCACAGGAG AAAAGTTATC AGAGACAGAG 
CTACTACAGC CTGTAACTCT TTCAGGAGGT 
CAGACTCAGG CATTCACTCA ACAGGCAGAT 
CTAGAACCTG CTGATACTAG CACCATAAAC 
GGTGCAAAGA AGGCAAAAAT AGAAACCAAA 
ACCATCACTT TATTGGACCC GACGGGCACG 
CAGTCCTACG ACATCTTAGA GCTCAAAGCT 
CCAGATCCTA TAATGGGTGA GAAATTCCAT 
GTTTGGGGGA CAGGGGCTTC TACGACTGCA 
CCTAATCCCG AGCGTATCGG CTCTTTAGTC 
ATTAGCTCTC TCCATTATCT TATGGAGACT 
TTTTGGTGTG CTGGATTATC TAACTTCTTC 
TTTCGCCATT TGAGTGGCGG TTATGTCATA 



TCTACATTTG CTATTTTCCC TTTGTCTATG 60 

GCGAGTTTCG ATGGGAATAA AAATGGTAAT 120 

GGAACTACCT ACCTATTTAA GGGAAATGTC 180 

GCAATCACAA AAAGCTGTTT TAACAACACT 240 

AACTCTCTAT TGTTCCAAAC GGTGGATGCA 300 

AGCGTGGTAG ATAAATCTAC CACGTTTATA 360 

CCTGGAAGTT CGATAACTAC CGGCAAAGGA 420 

TTTGACAAAA ATGTCAGTTT GCTCTTCAGC 480 

ATCACCGCAA AAACTCTTTC ATTAACAGGG 540 

ACCTCCTCAA AGAAAGGCGG AGCCATTCAG 600 

CAAGGGGAAG TCTCTTTTTC TGACAATACT 660 

GAAGCCTCGG TGACTATTTC TAATAATGCT 720 

GGAGCGAGCT CCTCAACAAC GGGGGATATG 780 

AGTACAGATA CTAAGGTCAC CCTCACTGGA 840 

TCGACAACAG CGGGAGGAGC TATCTATGTG 900 

ACCCTATTCA GTAGAAATAG TGTCAATGGA 960 

ATCGAAGATA GTGGGGAATT GAGTTTATCC 1020 

AATACAGTCA CTTCTACTAC TCCTGGGACG 1080 

GCAAAGATGA CAGCTTTGCG TTCTGCTGCT 1140 

ACTACAGGAT CTTCCACAAC AGTTACAGAT 1200 

TCTGCACTAC AATATACAGG GAACATCATC 1260 

GCCGCAGATT CTAAAAATCT TACTTCGAAG 1320 

ACTCTATCTT TAAAACATGG AGTGACTCTG 1380 

TCTCGTCTCG AAATGGACGT AGGAACTACT 1440 

AATTTGGTCA TTAACATCAG TTCTATAGAC 1500 

GCTACGTCAA AAAATCTGAC TTTATCTGGA 1560 

TTTTATGAAA ATCATAGTTT AAGAAATCCT 1620 

TCTGGAACTG TAACAAGCAC CGCAGTGACT 1680 

TACGGCTATC AGGGAACTTG GGGCCCAATT 1740 

ACCTTCAACT GGACTAAAAC TGGCTATATT 1800 

CCTAATAGCT TATGGAATGC ATTTATAGAT 1860 

GCAAACGAAG GGTTGCAGGG AGACCGTGCT 1920 

CATAAGGATA GTACAAAAAC ACGACGCGGG 1980 

GGAGGAAACC TACATACTTG TTCAGATAAG 2040 
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ATTCTTAGTG CTGCATTTTG TCAGCTCTTT GGAAGAGATA GAGACTACTT TGTAGCTAAG 2100 

AATCAAGGTA CAGTCTACGG AGGAACTCTC TATTACCAGC ACAACGAAAC CTATATCTCT 2160 

CTTCCTTGCA AACTACGGCC TTGTTCGTTG TCTTATGTTC CTACAGAGAT TCCTGTTCTC 2220 

TTTTCAGGAA ACCTTAGCTA CACCCATACG GATAACGATC TGAAAACCAA GTATACAACA 2280 

TATCCTACTG TTAAAGGAAG CTGGGGGAAT GATAGTTTCG CTTTAGAATT CGGTGGAAGA 2340 

GCTCCGATTT GCTTAGATGA AAGTGCTCTA TTTGAGCAGT ACATGCCCTT CATGAAATTG 2400 

CAGTTTGTCT ATGCACATCA GGAAGGTTTT AAAGAACAGG GAACAGAAGC TCGTGAATTT 2460 

GGAAGTAGCC GTCTTGTGAA TCTTGCCTTA CCTATCGGGA TCCGATTTGA TAAGGAATCA 2520 

GACTGCCAAG ATGCAACGTA CAATCTAACT CTTGGTTATA CTGTGGATCT TGTTCGTAGT 2580 

AACCCCGACT GTACGACAAC ACTGCGAATT AGCGGTGATT CTTGGAAAAC CTTCGGTACG 2640 

AATTTGGCAA GACAAGCTTT AGTCCTTCGT GCAGGGAACC ATTTTTGCTT TAACTCAAAT 2700 

TTTGAAGCCT TTAGCCAATT TTCTTTTGAA TTGCGTGGGT CATCTCGCAA TTACAATGTA 2760 

GACTTAGGAG CAAAATACCA ATTCTAA 2787 

(2) INFORMATION -FOR SEQ ID NO: 10 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 928 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 10: 

Met Lys Ser Ser Phe Pro Lys Phe Val Phe Ser Thr Phe Ala lie Phe 

1 5 io 15 

Pro Leu Ser Met He Ala Thr Glu Thr Val Leu Asp Ser Ser Ala Ser 

20 25 30 

Phe Asp Gly Asn Lys Asn Gly Asn Phe Ser Val Arg Glu Ser Gin Glu 

35 40 45 

Asp Ala Gly Thr Thr Tyr Leu Phe Lys Gly Asn Val Thr Leu Glu Asn 

50 55 60 

He Pro Gly Thr Gly Thr Ala He Thr Lys Ser Cys Phe Asn Asn Thr 
65 70 75 80 

Lys Gly Asp Leu Thr Phe Thr Gly Asn Gly Asn Ser Leu Leu Phe Gin 

85 90 95 

Thr Val Asp Ala Gly Thr Val Ala Gly Ala Ala Val Asn Ser Ser Val 

100 105 no 

Val Asp Lys Ser Thr Thr Phe He Gly Phe Ser Ser Leu Ser Phe He 

H5 120 125 

Ala Ser Pro Gly Ser Ser lie Thr Thr Gly Lys Gly Ala Val Ser Cys 

130 135 140 

Ser Thr Gly Ser Leu Lys Phe Asp Lys Asn Val Ser Leu Leu Phe Ser 
i45 150 155 160 

Lys Asn Phe Ser Thr Asp Asn Gly Gly Ala lie Thr Ala Lys Thr Leu 

165 170 175 

Ser Leu Thr Gly Thr Thr Met Ser Ala Leu Phe Ser Glu Asn Thr Ser 

180 185 190 

Ser Lys Lys Gly Gly Ala lie Gin Thr Ser Asp Ala Leu Thr lie Thr 

195 200 205 

Gly Asn Gin Gly Glu Val Ser Phe Ser Asp Asn Thr Ser Ser Asp Ser 

210 215 220 

Gly Ala Ala lie Phe Thr Glu Ala Ser Val Thr He Ser Asn Asn Ala 
225 230 235 240 

Lys Val Ser Phe lie Asp Asn Lys Val Thr Gly Ala Ser Ser Ser Thr 
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245 250 255 

Thr Gly Asp Met Ser Gly Gly Ala He Cys Ala Tyr Lys Thr Ser Thr 

260 265 270 

Asp Thr Lys Val Thr Leu Thr Gly Asn Gin Met Leu Leu Phe Ser Asn 

275 280 285 

Asn Thr Ser Thr Thr Ala Gly Gly Ala He Tyr Val Lys Lys Leu Glu 

290 295 300 

Leu Ala Ser Gly Gly Leu Thr Leu Phe Ser Arg Asn Ser Val Asn Gly 
305 310 315 320 

Gly Thr Ala Pro Lys Gly Gly Ala He Ala He Glu Asp Ser Gly Glu 

325 330 335 

Leu Ser Leu Ser Ala Asp Ser Gly Asp He Val Phe Leu Gly Asn Thr 

340 345 350 

Val Thr Ser Thr Thr Pro Gly Thr Asn Arg Ser Ser He Asp Leu Gly 

355 ^ rr- 

Thr Ser Ala Lys Met Thr Ala Leu Arg Ser Ala Ala Gly Arg Ala He 

370 375 380 

Tyr Phe Tyr Asp Pro He Thr Thr Gly Ser Ser Thr Thr Val Thr Asp 
385 390 395 400 

Val Leu Lys Val Asn Glu Thr Pro Ala Asp Ser Ala Leu Gin Tyr Thr 

405 410 415 

Gly Asn He He Phe Thr Gly Glu Lys Leu Ser Glu Thr Glu Ala Ala 

420 425 430 

Asp Ser Lys Asn Leu Thr Ser Lys Leu Leu Gin Pro Val Thr Leu Ser 

435 440 445 

Gly Gly Thr Leu Ser Leu Lys His Gly Val Thr Leu Gin Thr Gin Ala 

450 455 460 

Phe Thr Gin Gin Ala Asp Ser Arg Leu Glu Met Asp Val Gly Thr Thr 
465 470 475 480 

Leu Glu Pro Ala Asp Thr Ser Thr He Asn Asn Leu Val He Asn He 

485 490 495 

Ser Ser He Asp Gly Ala Lys Lys Ala Lys He Glu Thr Lys Ala Thr 

500 505 510 

Ser Lys Asn Leu Thr Leu Ser Gly Thr He Thr Leu Leu Asp Pro Thr 

515 520 525 

Gly Thr Phe Tyr Glu Asn His Ser Leu Arg Asn Pro Gin Ser Tyr Asp 

530 535 540 

He Leu Glu Leu Lys Ala Ser Gly Thr Val Thr Ser Thr Ala Val Thr 
545 550 555 560 

Pro Asp Pro He Met Gly Glu Lys Phe His Tyr Gly Tyr Gin Gly Thr 

565 570 575 

Trp Gly Pro He Val Trp Gly Thr Gly Ala Ser Thr Thr Ala Thr Phe 

580 585 590 

Asn Trp Thr Lys Thr Gly Tyr He Pro Asn Pro Glu Arg He Gly Ser 

595 600 605 

Leu Val Pro Asn Ser Leu Trp Asn Ala Phe He Asp He Ser Ser Leu 

610 615 620 

His Tyr Leu Met Glu Thr Ala Asn Glu Gly Leu Gin Gly Asp Arg Ala 
62 5 630 635 640 

Phe Trp Cys Ala Gly Leu Ser Asn Phe Phe His Lys Asp Ser Thr Lys 

645 650 655 

Thr Arg Arg Gly Phe Arg His Leu Ser Gly Gly Tyr Val He Gly Gly 

660 665 670 

Asn Leu His Thr Cys Ser Asp Lys He Leu Ser Ala Ala Phe Cys Gin 

675 680 685 

Leu Phe Gly Arg Asp Arg Asp Tyr Phe Val Ala Lys Asn Gin Gly Thr 
690 695 700 
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Val Tyr Gly Gly Thr Leu Tyr Tyr Gin His Asn Glu Thr Tyr lie Ser 
705 710 715 720 

Leu Pro Cys Lys Leu Arg Pro Cys Ser Leu Ser Tyr Val Pro Thr Glu 

725 730 735 

He Pro Val Leu Phe Ser Gly Asn Leu Ser Tyr Thr His Thr Asp Asn 

740 745 750 

Asp Leu Lys Thr Lys Tyr Thr Thr Tyr Pro Thr Val Lys Gly Ser Trp 

755 760 765 

Gly Asn Asp Ser Phe Ala Leu Glu Phe Gly Gly Arg Ala Pro He Cys 

770 775 780 

Leu Asp Glu Ser Ala Leu Phe Glu Gin Tyr Met Pro Phe Met Lys Leu 
785 790 795 800 

Gin Phe Val Tyr Ala His Gin Glu Gly Phe Lys Glu Gin Gly Thr Glu 

805 810 815 

Ala Arg Glu Phe Gly Ser Ser Arg Leu Val Asn Leu Ala Leu Pro lie 

820 825 830 

Gly lie Arg Phe Asp Lys Glu Ser Asp Cys Gin Asp Ala Thr Tyr Asn 

835 840 " 845 

Leu Thr Leu Gly Tyr Thr Val Asp Leu Val Arg Ser Asn Pro Asp Cys 

850 855 860 

Thr Thr Thr Leu Arg He Ser Gly Asp Ser Trp Lys Thr Phe Gly Thr 
865 870 875 880 

Asn Leu Ala Arg Gin Ala Leu Val Leu Arg Ala Gly Asn His Phe Cys 

885 890 895 

Phe Asn Ser Asn Phe Glu Ala Phe Ser Gin Phe Ser Phe Glu Leu Arg 

900 905 910 

Gly Ser Ser Arg Asn Tyr Asn Val Asp Leu Gly Ala Lys Tyr Gin Phe 
915 920 925 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2757 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ATGAGATCGT CTTTTTCCTT GTTATTAATA TCTTCATCTC TAGCCTTTCC TCTCTTAATG 60 

AGTGTTTCTG CAGATGCTGC CGATCTCACA TTAGGGAGTC GTGACAGTTA TAATGGTGAT 120 

ACAAGCACCA CAGAATTTAC TCCTAAAGCG GCAACTTCTG ATGCTAGTGG CACGACCTAT 180 

AT TCTCGA TG GGGATGTCTC GATAAGCCAA GCAGGGAAAC AAACGAGCTT AACCACAAGT 240 

TGTTTTTCTA ACACTGCAGG AAATCTTACC TTCTTAGGGA ACGGATTTTC TCTTCATTTT 300 

GACAATATTA TTTCGTCTAC TGTTGCAGGT GTTGTTGTTA GCAATACAGC AGCTTCTGGG 360 

ATTACGAAAT TCTCAGGATT TTCAACTCTT CGGATGCTTG CAGCTCCTAG GACCACAGGT 420 

AAAGGAGCCA TTAAAATTAC CGATGGTCTG GTGTTTGAGA GTATAGGGAA TCTTGACCAA 480 

AATGAAAATG CCTCTAGTGA AAATGGGGGA GCCATCAATA CGAAGACTTT GTCTTTGACT 540 

GGGAGTACGC GGTTTGTAGC GTTCCTTGGC AATAGCTCGT CGCAACAAGG GGGAGCGATC 600 

TATGCTTCTG GTGACTCTGT GATTTCTGAG AATGCAGGAA TCTTGAGCTT CGGAAACAAC 660 

AGTGCGACAA CATC AGGAGG CGCGATCTCT GCTGAAGGGA ACCTTGTGAT CTCCAATAAC 720 

CAAAATATCT TTTTCGATGG CTGCAAAGCA ACTACAAATG GCGGAGCTAT TGATTGTAAC 780 

AAAGCAGGGG CGAACCCAGA CCCTATCTTG ACTCTTTCAG GAAATGAGAG CCTGCATTTT 840 

CTGAATAACA CAGCAGGAAA TAGTGGAGGT GCGATTTATA CCAAAAAATT GGTGTTATCC 900 

TCAGGACGAG GAGGAGTGTT ATTTTCTAAC AACAAAGCTG CGAATGCTAC TCCTAAAGGA 960 
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GGGGCAATTG CGATTCTAGA TTCTGGAGAG ATTAGCATTT CTGCAGATCT CGGCAATATC 1020 

ATTTTCGAGG GCAATACTAC GAGCACTACA GGAAGTCCTG CGAGTGTGAC CAGAAATGCT 1080 

ATAGATCTTG CATCGAATGC AAAATTTTTA AATCTCCGAG CGACTCGGGG AAATAAAGTT 1140 

ATTTTCTATG ATCCTATCAC GAGCTCAGGA GCTACTGATA AGCTCTCTTT GAATAAAGCT 1200 

GACGCAGGAT CTGGAAATAC CTATGAAGGC TACATCGTTT TCTCTGGAGA GAAACTCTCA 1260 

GAAGAGGAAC TTAAGAAACC TGACAATCTG AAGTCTACAT TTACACAGGC TGTAGAGCTT 1320 

GCTGCAGGTG CCTTAGTATT GAAAGATGGA GTGACTGTAG TTGCAAATAC TATAACGCAG 1380 

GTCGAGGGAT CGAAAGTCGT TATGGATGGA GGGACTACTT TTGAGGCAAG CGCTGAGGGG 1440 

GTCACTCTCA ATGGCCTAGC CATTAATATA GATTCCTTAG ATGGGACAAA TAAAGCTATC 1500 

ATTAAGGCGA CGGCAGCAAG TAAGGATGTT GCCTTATCAG GGCCTATCAT GCTTGTAGAT 1560 

GCTCAGGGGA ACTATTATGA GCATCATAAT CTCAGTCAAC AGCAGGTCTT TCCTTTAATA 1620 

GAGCTTTCTG CACAAGGAAC GATGACTACT ACAGATATCC CCGATACCCC AATTCTAAAT 1680 

ACTACGAATC ACTATGGGTA TCAAGGAACT GGAATAATTG TTTGGGTCGA CGATGCAACT 1740 

GCAAAAACAA AAAATGCTAC CTTAACTTGG ACTAAAACAG GATACAAGCC GAATCCAGAA 1800 

GGTGAGGGAC CTTTGGTTCC TAATAGGGTG TGGGGTTCTT TTGTGGATGT GCGCTCCATT i860 

CAGAGCCTCA TGGACCGGAG CACAAGTTCG TTATCTTCGT CAACAAATTT GTGGGTATCA 1920 

GGAATCGCGG ACTTTTTGCA TGAAGATCAG AAAGGAAACC AACGTAGTTA TCGTCATTCT 1980 

AGCGCGGGTT ATGCATTAGG AGGAGGATTC TTCACGGCTT CTGAAAATTT CTTTAATTTT " 2040 

GCTTTTTGTC AGCTTTTTGG CTACGACAAG GACCATCTTG TGGCTAAGAA CCATACCCAT 2100 

GTATATGCAG GGGCAATGAG TTACCGACAC CTCGGAGAGT CTAAGACCCT CGCTAAGATT 2160 

TTGTCAGGAA ATTCTGACTC CCTACCTTTT GTCTTCAATG CTCGGTTTGC TTATGGCCAT 2220 

ACCGACAATA ACATGACCAC AAAGTACACT GGCTATTCTC CTGTTAAGGG AAGCTGGGGA 2280 

AATGATGCCT TCGGTATAGA ATGTGGAGGA GCTATCCCGG TAGTTGCTTC AGGACGTCGG 2340 

TCTTGGGTGG ATACCCACAC GCCATTTCTA AACCTAGAGA TGATCTATGC ACATCAGAAT 2400 

GACTTTAAGG AAAACGGCAC AGAAGGCCGT TCTTTCCAAA GTGAAGACCT CTTCAATCTA 2460 

GCGGTTCCTG TAGGGATAAA ATTTGAGAAA TTCTCCGATA AGTCTACGTA TGATCTCTCC 2520 

ATAGCTTACG TTCCGGATGT GATTCGTAAT GATCCAGGCT GCACGACAAC TCTTATGGTT 2580 

TCTGGGGATT CTTGGTCGAC ATGTGGTACA AGCTTGTCTA GACAAGCTCT TCTTGTACGT 2640 

GCTGGAAATC ATCATGCCTT TGCTTCAAAC TTTGAAGTTT TCAGTCAGTT TGAAGTCGAG 2700 

TTGCGAGGTT CTTCTCGTAG CTATGCTATC GATCTTGGAG GAAGATTCGG ATTTTAA 2757 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 918 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Arg Ser Ser Phe Ser Leu Leu Leu He Ser Ser Ser Leu Ala Phe 

1 5 io 15 

Pro Leu Leu Met Ser Val Ser Ala Asp Ala Ala Asp Leu Thr Leu Gly 

20 25 30 

Ser Arg Asp Ser Tyr Asn Gly Asp Thr Ser Thr Thr Glu Phe Thr Pro 

35 40 45 

Lys Ala Ala Thr Ser Asp Ala Ser Gly Thr Thr Tyr He Leu Asp Gly 

50 55 60 

Asp Val Ser He Ser Gin Ala Gly Lys Gin Thr Ser Leu Thr Thr Ser 
65 70 75 80 

Cys Phe Ser Asn Thr Ala Gly Asn Leu Thr Phe Leu Gly Asn Gly Phe 

85 90 95 

Ser Leu His Phe Asp Asn He He Ser Ser Thr Val Ala Gly Val Val 
100 105 no 
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Val Ser Asn Thr Ala Ala Ser Gly He Thr Lys Phe Ser Gly Phe Ser 

115 120 125 

Thr Leu Arg Met Leu Ala Ala Pro Arg Thr Thr Gly Lys Gly Ala He 

130 135 140 

Lys He Thr Asp Gly Leu Val Phe Glu Ser He Gly Asn Leu Asp Gin 
145 150 155 160 

Asn Glu Asn Ala Ser Ser Glu Asn Gly Gly Ala He Asn Thr Lys Thr 

165 170 175 

Leu Ser Leu Thr Gly Ser Thr Arg Phe Val Ala Phe Leu Gly Asn Ser 

180 185 190 

Ser Ser Gin Gin Gly Gly Ala He Tyr Ala Ser Gly Asp Ser Val He 

195 200 205 

Ser Glu Asn Ala Gly He Leu Ser Phe Gly Asn Asn Ser Ala Thr Thr 

210 215 220 

Ser Gly Gly Ala He Ser Ala Glu Gly Asn Leu Val He Ser Asn Asn 
225 230 235 240 

Gin Asn He Phe Phe Asp Gly Cys Lys Ala Thr Thr Asn Gly Gly Ala 

245 250 255 

He Asp Cys Asn Lys Ala Gly Ala Asn Pro Asp Pro He Leu Thr Leu 

260 265 270 

Ser Gly Asn Glu Ser Leu His Phe Leu Asn Asn Thr Ala Gly Asn Ser 

275 280 285 

Gly Gly Ala He Tyr Thr Lys Lys Leu Val Leu Ser Ser Gly Arg Gly 

290 295 300 

Gly Val Leu Phe Ser Asn Asn Lys Ala Ala Asn Ala Thr Pro Lys Gly 
305 310 315 320 

Gly Ala He Ala He Leu Asp Ser Gly Glu He Ser He Ser Ala Asp 

325 330 335 

Leu Gly Asn He He Phe Glu Gly Asn Thr Thr Ser Thr Thr Gly Ser 

340 345 350 

Pro Ala Ser Val Thr Arg Asn Ala He Asp Leu Ala Ser Asn Ala Lys 

355 360 365 

Phe Leu Asn Leu Arg Ala Thr Arg Gly Asn Lys Val He Phe Tyr Asp 

370 375 380 

Pro He Thr Ser Ser Gly Ala Thr Asp Lys Leu Ser Leu Asn Lys Ala 
385 390 395 400 

Asp Ala Gly Ser Gly Asn Thr Tyr Glu Gly Tyr He Val Phe Ser Gly 

405 410 415 

Glu Lys Leu Ser Glu Glu Glu Leu Lys Lys Pro Asp Asn Leu Lys Ser 

420 425 430 

Thr Phe Thr Gin Ala Val Glu Leu Ala Ala Gly Ala Leu Val Leu Lys 

435 440 445 

Asp Gly Val Thr Val Val Ala Asn Thr He Thr Gin Val Glu Gly Ser 

450 455 460 

Lys Val Val Met Asp Gly Gly Thr Thr Phe Glu Ala Ser Ala Glu Gly 
465 470 475 480 

Val Thr Leu Asn Gly Leu Ala He Asn He Asp Ser Leu Asp Gly Thr 

485 490 495 

Asn Lys Ala He He Lys Ala Thr Ala Ala Ser Lys Asp Val Ala Leu 

500 505 510 

Ser Gly Pro He Met Leu Val Asp Ala Gin Gly Asn Tyr Tyr Glu His 

515 520 525 

His Asn Leu Ser Gin Gin Gin Val Phe Pro Leu He Glu Leu Ser Ala 

530 535 540 

Gin Gly Thr Met Thr Thr Thr Asp He Pro Asp Thr Pro He Leu Asn 
545 550 555 560 

Thr Thr Asn His Tyr Gly Tyr Gin Gly Thr Gly He He Val Trp Val 
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He 
620 


uin oer jjeu 


Met 
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870 
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890 


Phe 
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895 
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Phe 
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Arg 
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Ser 


Ser 
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Arg 


Ser 


Tyr 
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Leu 
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Arg 


Phe 


Gly 


Phe 



















915 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2787 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
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ATGAAATCCT CTCTTCATTG GTTTGTAATC TCGTCATCTT TAGCACTTCC CTTGTCACTA 60 

AATTTCTCTG CGTTTGCTGC TGTTGTTGAA ATCAATCTAG GACCTACCAA TAGCTTCTCT 120 

GGACCAGGAA CCTACACTCC TCCAGCCCAA ACAACAAATG CAGATGGAAC TATCTATAAT 180 

CTAACAGGGG ATGTCTCAAT CACCAATGCA GGATCTCCGA CAGCTCTAAC CGCTTCCTGC 240 

TTTAAAGAAA CTACTGGGAA TCTTTCTTTC CAAGGCCACG GCTACCAATT TCTCCTACAA 300 

AATATCGATG CGGGAGCGAA CTGTACCTTT ACCAATACAG CTGCAAATAA GCTTCTCTCC 360 

TTTTCAGGAT TCTCCTATTT GTCACTAATA CAAACCACGA ATGCTACCAC AGGAACAGGA 420 

GCCATCAAGT CCACAGGAGC TTGTTCTATT CAGTCGAACT ATAGTTGCTA CTTTGGCCAA 480 

AACTTTTCTA ATGACAATGG AGGCGCCCTC CAAGGCAGCT CTATCAGTCT ATCGCTAAAC 540 

CCCAACCTAA CGTTTGCCAA AAACAAAGCA ACGCAAAAAG GGGGTGCCCT CTATTCCACG 600 

GGAGGGATTA CAATTAACAA TACGTTAAAC TCAGCATCAT TTTCTGAAAA TACCGCGGCG 660 

AACAATGGCG GAGCCATTTA CACGGAAGCT AGCAGTTTTA TTAGCAGCAA CAAAGCAATT 720 

AGCTTTATAA ACAATAGTGT GACCGCAACC TCAGCTACAG GGGGAGCCAT TTACTGTAGT 780 

AGTACATCAG CCCCCAAACC AGTCTTAACT CTATCAGACA ACGGGGAACT GAACTTTATA 840 

GGAAATAGAG GA ATTACTA G TGGTGGGGGG ATTTATACTG AGAATCTAGT TCTTTCTTCT 300 

GGAGGACCTA CGCTTTTTAA AAACAACTCT GCTATAGATA CTGCAGCTCC CTTAGGAGGA 960 

GCAATTGCGA TTGCTGACTC TGGATCTTTG AGT CTTT CGG CTCTTGGTGG AGACATCACT 1020 

TTTGAAGGAA ACACAGT AGT CAAAGGAGCT TCTTCGAGTC AGACCACTAC CAGAAATTCT 1080 

ATTAACATCG GAAACACCAA TGCTAAGATT GTACAGCTGC GAGCCTCTCA AGGCAATACT 1140 

ATCTACTTCT ATGATCCTAT AACAACTAAC CATACTGCAG CTCTCTCAGA TGCTCTAAAC 1200 

TTAAATGGTC CTGACCTTGC AGGGAATCCT GCATATCAAG GAACCATCGT ATTTTCTGGA 1260 

GAGAAGCTCT CGGAAGCAGA AGCTGCAGAA GCTGATAATC TCAAATCTAC AATTCAGCAA 1320 

CC TCTAA CTC TTGCGGGAGG GCAACTCTCT CTTAAATCAG GAGTCACTCT AGTTGCTAAG 1380 

TCCTTTTCGC AATCTCCGGG CTCTACCCTC CTCATGGATG CAGGGACCAC ATTAGAAACC 1440 

GCTGATGGGA TCACTATCAA TAATCTTGTT CTCAATGTAG ATTCCTTAAA AGAGACCAAG 1500 

AAGGCTACGC TAAAAGCAAC ACAAGCAAGT CAGACAGTCA CTTTATCTGG ATCGCTCTCT 1560 

CTTGTAGATC CTTCTGGAAA TGTCTACGAA GATGTCTCTT GGAATAACCC TCAAGTCTTT 1620 

TCTTGTCTCA CTCTTACTGC TGACGACCCC GCGAATATTC ACATCACAGA CTTAGCTGCT 1680 

GATCCCCTAG AAAAAAATCC TATCCATTGG GGATACCAAG GGAATTGGGC ATTATCTTGG 1740 

CAAGAGGATA CTGCGACTAA ATCCAAAGCA GCGACTCTTA CCTGGACAAA AACAGGATAC 1800 

AATCCGAATC CTGAGCGTCG TGGAACCTTA GTTGCTAACA CGCTATGGGG ATCCTTTGTT 1860 

GATGTGCGCT CCATACAACA GCTTGTAGCC ACTAAAGTAC GCCAATCTCA AGAAACTCGC 1920 

GGCATCTGGT GTGAAGGGAT CTCGAACTTC TTCCATAAAG ATAGCACGAA GATAAATAAA 1980 

GGTTTTCGCC ACATAAGTGC AGGTTATGTT GTAGGAGCGA CTACAACATT AGCTTCTGAT 2040 

AATCTTATCA CTGCAGCCTT CTGCCAATTA TTCGGGAAAG ATAGAGATCA CTTTATAAAT 2100 

AAAAATAGAG CTTCTGCCTA TGCAGCTTCT CTCCATCTCC AGCATCTAGC GACCTTGTCT 2160 

TCTCCAAGCT TGTTACGCTA CCTTCCTGGA TCTGAAAGTG AGCAGCCTGT CCTCTTTGAT 2220 

GCTCAGATCA GCTATATCTA TAGTAAAAAT ACTATGAAAA CCTATTACAC CCAAGCACCA 2280 

AAGGGAGAGA GCTCGTGGTA TAATGACGGT TGCGCTCTGG AACTTGCGAG CTCCCTACCA 2340 

CACACTGCTT TAAGCCATGA GGGTCTCTTC CACGCGTATT TTCCTTTCAT CAAAGTAGAA 2400 

GCTTCGTACA TACACCAAGA TAGCTTCAAA GAACGTAATA CTACCTTGGT ACGATCTTTC 2460 

GATAGCGGTG ATTTAATTAA CGTCTCTGTG CCTATTGGAA TTACCTTCGA GAGATTCTCG 2520 

AGAAACGAGC GTGCGTCTTA CGAAGCTACT GTCATCTACG TTGCCGATGT CTATCGTAAG 2580 

AATCCTGACT GCACGACAGC TCTCCTAATC AACAATACCT CGTGGAAAAC TACAGGAACG 2640 

AATCTCTCAA GACAAGCTGG TATCGGAAGA GCAGGGATCT TTTATGCCTT CTCTCCAAAT 2700 

CTTGAGGTCA CAAGTAACCT ATCTATGGAA ATTCGTGGAT CTTCACGCAG CTACAATGCA 2760 

GATCTTGGAG GTAAGTTCCA GTTCTAA 2787 

(2) INFORMATION FOR SBQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 928 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Lys Ser Ser Leu His Trp Phe Val He Ser Ser Ser Leu Ala Leu 

15 io is 

Pro Leu Ser Leu Asn Phe Ser Ala Phe Ala Ala Val Val Glu He Asn 

20 25 30 

Leu Gly Pro Thr Asn Ser Phe Ser Gly Pro Gly Thr Tyr Thr Pro Pro 

35 40 45 

Ala Gin Thr Thr Asn Ala Asp Gly Thr He Tyr Asn Leu Thr Gly Asp 

50 55 * 60 

Val Ser He Thr Asn Ala Gly Ser Pro Thr Ala Leu Thr Ala Ser Cys 
65 70 75 80 

Phe Lys Glu Thr Thr Gly Asn Leu Ser Phe Gin Gly His Gly Tyr Gin 

85 90 95 

Phe Leu Leu Gin Asn He Asp Ala Gly Ala Asn Cys Thr Phe Thr Asn 

100 105 no 

Thr Ala Ala Asn Lys Leu Leu Ser Phe Ser Gly Phe Ser Tyr Leu Ser 

115 120 125 

Leu He Gin Thr Thr Asn Ala Thr Thr Gly Thr Gly Ala He Lys Ser 

130 135 140 

Thr Gly Ala Cys Ser He Gin Ser Asn Tyr Ser Cys Tyr Phe Gly Gin 
145 150 155 160 

Asn Phe Ser Asn Asp Asn Gly Gly Ala Leu Gin Gly Ser Ser He Ser 

165 170 175 

Leu Ser Leu Asn Pro Asn Leu Thr Phe Ala Lys Asn Lys Ala Thr Gin 

180 185 190 

Lys Gly Gly Ala Leu Tyr Ser Thr Gly Gly He Thr He Asn Asn Thr 

195 200 205 

Leu Asn Ser Ala Ser Phe Ser Glu Asn Thr Ala Ala Asn Asn Gly Gly 

210 215 220 

Ala He Tyr Thr Glu Ala Ser Ser Phe He Ser Ser Asn Lys Ala He 
225 230 235 240 

Ser Phe He Asn Asn Ser Val Thr Ala Thr Ser Ala Thr Gly Gly Ala 

245 250 255 

He Tyr Cys Ser Ser Thr Ser Ala Pro Lys Pro Val Leu Thr Leu Ser 

260 265 270 

Asp Asn Gly Glu Leu Asn Phe He Gly Asn Thr Ala He Thr Ser Gly 

275 280 285 

Gly Ala He Tyr Thr Asp Asn Leu Val Leu Ser Ser Gly Gly Pro Thr 

290 295 300 

Leu Phe Lys Asn Asn Ser Ala He Asp Thr Ala Ala Pro Leu Gly Gly 
305 310 315 320 

Ala He Ala He Ala Asp Ser Gly Ser Leu Ser Leu Ser Ala Leu Gly 

325 330 335 

Gly Asp He Thr Phe Glu Gly Asn Thr Val Val Lys Gly Ala Ser Ser 

340 345 350 

Ser Gin Thr Thr Thr Arg Asn Ser lie Asn He Gly Asn Thr Asn Ala 

355 360 365 

Lys He Val Gin Leu Arg Ala Ser Gin Gly Asn Thr He Tyr Phe Tyr 

370 375 380 

Asp Pro He Thr Thr Asn His Thr Ala Ala Leu Ser Asp Ala Leu Asn 
385 390 395 400 

Leu Asn Gly Pro Asp Leu Ala Gly Asn Pro Ala Tyr Gin Gly Thr He 

405 410 415 

Val Phe Ser Gly Glu Lys Leu Ser Glu Ala Glu Ala Ala Glu Ala Asp 

420 425 430 

Asn Leu Lys Ser Thr He Gin Gin Pro Leu Thr Leu Ala Gly Gly Gin 
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rne 


Lys 
810 


Glu Arg 


Asn 


Thr 


Thr 
815 


Leu 


Val 


Arg 


Ser 


Phe Asp Ser Gly 
820 


Asp 


Leu 
825 


Tie 


TV en \7=> 1 
noli Val 


Ser 


vai 
830 


Pro 


lie 


Gly 


He 


Thr 


Phe Glu Arg Phe 


Ser 


Arcr 


Asn 


Gl u Arcr 
vjx u ru y 


Al » 

nXa 


Cor 
OCX 


Tyr 


ValU 






835 




840 








845 






Ala 


Thr 


Val 


He Tyr Val Ala 


Asp 


Val 


Tyr 


Arg Lys 


Asn 


Pro 


Asp 


Cys 




850 




855 








860 






Thr 


Thr 


Ala 


Leu Leu He Asn 


Asn 


Thr 


Ser 


Trp Lys 


Thr 


Thr 


Gly 


Thr 


665 






870 








875 






880 


Asn 


Leu 


Ser 


Arg Gin Ala Gly 
885 


He 


Gly 


Arg 
890 


Ala Gly 


He 


Phe 


Tyr 
895 


Ala 
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Phe Ser Pro Asn Leu Glu Val Thr Ser Asn Leu Ser Met Glu He Arg 

900 905 910 

Gly Ser Ser Arg Ser Tyr Asn Ala Asp Leu Gly Gly Lys Phe Gin Phe 
915 920 925 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2793 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATGAAAATAC CCTTGCACAA ACTCCTGATC TCTTCGACTC TTGTCACTCC CATTCTATTG 60 

AGCATTGCAA CTTACGGAGC AGATGCTTCT TTATCCCCTA CAGATAGCTT TGATGGAGCG 120 

GGCGGCTCTA CATTTACTCC AAAATCTACA GCAGATGCCA ATGGAACGAA CTATGTCTTA 180 

TCAGGAAATG TCTATATAAA CGATGCTGGG AAAGGCACAG CATTAACAGG CTGCTGCTTT 240 

ACAGAAACTA CGGGTGATCT GACATTTACT GGAAAGGGAT ACTCATTTTC ATTCAACACG 300 

GTAGATGCGG GTTCGAATGC AGGAGCTGCG GCAAGCACAA CTGCTGATAA AGCCCTAACA 360 

TTCACAGGAT TTTCTAACCT TTCCTTCATT GCAGCTCCTG GAACTACAGT TGCTTCAGGA 420 

AAAAGTACTT TAAGTTCTGC AGGAGCCTTA AATCTTACCG ATAATGGAAC GATTCTCTTT 480 

AGCCAAAACG TCTCCAATGA AGCTAATAAC AATGGCGGAG CGATCACCAC AAAAACTCTT 54 0 

TCTATTTCTG GGAATACCTC TTCTATAACC TTCACTAGTA ATAGCGCAAA AAAATTAGGT 600 

GGAGCGATCT ATAGCTCTGC GGCTGCAAGT ATTTCAGGAA ACACCGGCCA GTTAGTCTTT 660 

ATGAATAATA AAGGAGAAAC TGGGGGCGGG GCTCTGGGCT TTGAAGCCAG CTCCTCGATT 720 

ACTCAAAATA GCTCCCTTTT CTTCTCTGGA AACACTGCAA CAGATGCTGC AGGCAAGGGC 780 

GGGGCCATTT ATTGTGAAAA AACAGGAGAG ACTCCTACTC TTACTATCTC TGGAAATAAA 840 

AGTCTG ACCT TCGCCGAGAA CTCTTCAGTA ACTCAAGGCG GAGCAATCTG TGCCCATGGT 900 

CTAGATCTTT CCGCTGCTGG CCCTACCCTA TTTTCAAATA ATAGATGCGG GAACACAGCT 960 

GCAGGCAAGG GCGGCGCTAT TGCAATTGCC GACTCTGGAT CTTTAAGTCT CTCTGCAAAT 1020 

CAAGGAGACA TCACGTTCCT TGGCAACACT CTAACCTCAA CCTCCGCGCC AACATCGACA 1080 

CGGAATGCTA TCTACCTGGG ATCGTCAGCA AAAATTACGA ACTTAAGGGC AGCCCAAGGC 1140 

CAATCTATCT ATTTCTATGA TCCGATTGCA TCTAACACCA CAGGAGCTTC AGACGTTCTG 1200 

ACCATCAACC AACCGGATAG CAACTCGCCT TTAGATTATT CAGGAACGAT TGTATTTTCT 1260 

GGGGAAAAGC TCTCTGCAGA TGAAGCGAAA GCTGCTGATA ACTTCACATC TATATTAAAG 1320 

CAACCATTGG CTCTAGCCTC TGGAACCTTA GCACTCAAAG GAAATGTCGA GTTAGATGTC 1380 

AATGGTTTCA CACAGACTGA AGGCTCTACA CTCCTCATGC AACCAGGAAC AAAGCTCAAA 1440 

GCAGATACTG AAGCTATCAG TCTTACCAAA CTTGTCGTTG ATCTTTCTGC CTTAGAGGGA 1500 

AAT AAGAGT G TGTCCATTGA AACAGCAGGA GCCAACAAAA CTATAACTCT AACCTCTCCT 1560 

CTTGTTTTCC AAGATAGTAG CGGCAATTTT TATGAAAGCC ATACGATAAA CCAAGCCTTC 1620 

ACGCAGCCTT TGGTGGTATT CACTGCTGCT ACTGCTGCTA GCGATATTTA TATCGATGCG 16 B0 

CTTCTCACTT CTCCAGTACA AACTCCAGAA CCTCATTACG GGTATCAGGG ACATTGGGAA 1740 

GCCACTTGGG CAGACACATC AACTGCAAAA TCAGGAACTA TGACTTGGGT AACTACGGGC 1800 

TACAACCCTA ATCCTGAGCG TAGAGCTTCC GTAGTTCCCG ATTCATTATG GGCATCCTTT 1860 

ACTGACATTC GCACTCTACA GCAGATCATG ACATCTCAAG CGAATAGTAT CTATCAGCAA 1920 

CGAGGACTCT GGGCATCAGG AACTGCGAAT TTCTTCCATA AGGATAAATC AGGAACTAAC 1980 

CAAGCATTCC GACATAAAAG CTACGGCTAT ATTGTTGGAG GAAGTGCTGA AGATTTTTCT 2040 

GAAAATATCT TCAGTGTAGC TTTCTGCCAG CTCTTCGGTA AAGATAAAGA CCTGTTTATA 2100 

GTTGAAAATA CCTCTCATAA CTATTTAGCG TCGCTATACC TGCAACATCG AGCATTCCTA 2160 

GGAGGACTTC CCATGCCCTC ATTTGGAAGT ATCACCGACA TGCTGAAAGA TATTCCTCTC 2220 

ATTTTGAATG CCCAGCTAAG CTACAGCTAC ACTAAAAATG ATATGGATAC TCGCTATACT 2280 

TCCTATCCTG AAGCTCAAGG TTCTTGGACC AATAATTCTG GGGCTCTAGA GCTCGGAGGA 2340 

TCTCTGGCTC TATATCTCCC TAAAGAAGCA CCGTTCTTCC AGGGATATTT CCCCTTCTTA 2400 
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AAGTTCCAGG CAGTCTACAG CCGCCAACAA AACTTTAAAG AGAGTGGCGC TGAAGCCCGT 2460 

GCTTTTGATG ATGGAGACCT AGTGAACTGC TCTATCCCTG TCGGCATTCG GTTAGAAAAA 2520 

ATCTCCGAAG ATGAAAAAAA TAATTTCGAG ATTTCTCTAG CCAACATTGG TGATGTGTAT 2580 

CGTAAAAATC CCCGTTCGCG TACTTCTCTA ATGGTCAGTG GAGCCTCTTG GACTTCGCTA 2640 

TGTAAAAACC TCGCACGACA AGCCTTCTTA GCAAGTGCTG GAAGCCATCT GACTCTCTCC 2700 

CCTCATGTAG AACTCTCTGG GGAAGCTGCT TATGAGCTTC GTGGCTCAGC ACACATCTAC 2760 

AATGTAGATT GTGGGCTAAG ATACTCATTC TAG 2793 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 930 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Met 


Lys 


He 


Pro 


Leu 


His 


Lys 


Leu 


Leu 


He 


Ser 


Ser Thr Leu Val Thr 


1 








5 










10 




15 


Pro 


lie 


Leu 


Leu 


Ser 


He 


Ala 


Thr 


Tyr 


Gly 


Ala 


Asp Ala Ser Leu Ser 


















25 






30 


Pro 


inr 


Asp 


Ser 


Pne 


Asp 


Gly 


Ala 


Gly 


Gly 


Ser 


Thr Phe Thr Pro Lys 






J3 










40 








45 


Ser 


±nx 


Ala 


Asp 


Ala 


Asn 


Gly 


Thr 


Asn 


Tyr 


Val 


Leu Ser Gly Asn Val 




en 










55 










60 


Tyx 


11c 


Asn 


Asp 


Ala 


Gly 


Lys 


Gly 


Thr 


Ala 


Leu 


Thr Gly Cys Cys Phe 


65 










70 










f D 


80 


Thr 


Glu 


Thr 


Thr 


Gly 


Asp 


Leu 


Thr 


Phe 


Thr 


Gly 


Lys Gly Tyr Ser Phe 










85 










90 




95 


Ser 


Phe 


Asn 


Thr 


Val 


Asp 


Ala 


Gly 


Ser 


Asn 


Ala 


Gly Ala Ala Ala Ser 








100 










105 






110 


Thr 


Thr 


Ala 


Asp 


Lys 


Ala 


Leu 


Thr 


Phe 


Thr 


Gly 


Phe Ser Asn Leu Ser 






115 










120 








125 


Phe 


He 


Ala 


Ala 


Pro 


Gly 


Thr 


Thr 


Val 


Ala 


Ser 


Gly Lys Ser Thr Leu 




130 










135 










140 


Ser 


Ser 


Ala 


Gly 


Ala 


Leu 


Asn 


Leu 


Thr 


Asp 


Asn 


Gly Thr He Leu Phe 


145 










150 










155 


160 


Ser 


Gin 


Asn 


Val 


Ser 


Asn 


Glu 


Ala 


Asn 


Asn 


Asn 


Gly Gly Ala He Thr 


Thr 








165 










170 




175 


Lys 


Thr 


Leu 


Ser 


He 


Ser 


Gly 


Asn 


Thr 


Ser 


Ser He Thr Phe Thr 








180 










185 






190 


Ser 


Asn 


Ser 


Ala 


Lys 


Lys 


Leu 


Gly 


Gly 


Ala 


He 


Tyr Ser Ser Ala Ala 






195 










200 








205 


Ala 


Ser 


He 


Ser 


Gly 


Asn 


Thr 


Gly 


Gin 


Leu 


Val 


Phe Met Asn Asn Lys 




210 










215 










220 


Gly 


Glu 


Thr 


Gly 


Gly 


Gly 


Ala 


Leu 


Gly 


Phe 


Glu 


Ala Ser Ser Ser He 


225 










230 










235 


240 


Thr 


Gin 


Asn 


Ser 


Ser 


Leu 


Phe 


Phe 


Ser 


Gly 


Asn 


Thr Ala Thr Asp Ala 










245 










250 




255 


Ala 


Gly 


Lys 


Gly 


Gly 


Ala 


He 


Tyr 


Cys 


Glu 


Lys 


Thr Gly Glu Thr Pro 








260 










265 






270 


Thr 


Leu 


Thr 


He 


Ser 


Gly 


Asn 


Lys 


Ser 


Leu 


Thr 


Phe Ala Glu Asn Ser 






275 










280 








285 


Ser 


Val 


Thr 


Gin 


Gly 


Gly 


Ala 


He 


Cys 


Ala 


His 


Gly Leu Asp Leu Ser 
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290 295 300 

Ala Ala Gly Pro Thr Leu Phe Ser Asn Asn Arg Cys Gly Asn Thr Ala 
305 310 315 320 

Ala Gly Lys Gly Gly Ala He Ala He Ala Asp Ser Gly Ser Leu Ser 

325 330 335 

Leu Ser Ala Asn Gin Gly Asp lie Thr Phe Leu Gly Asn Thr Leu Thr 

340 345 35 0 

Ser Thr Ser Ala Pro Thr Ser Thr Arg Asn Ala He Tyr Leu Gly Ser 

355 360 365 

Ser Ala Lys He Thr Asn Leu Arg Ala Ala Gin Gly Gin Ser He Tyr 

370 375 380 

Phe Tyr Asp Pro He Ala Ser Asn Thr Thr Gly Ala Ser Asp Val Leu 
385 390 395 400 

Thr He Asn Gin Pro Asp Ser Asn Ser Pro Leu Asp Tyr Ser Gly Thr 

405 410 415 

He Val Phe Ser Gly Glu Lys Leu Ser Ala Asp Glu Ala Lys Ala Ala 

420 425 430 

Asp Asn Phe Thr Ser He Leu Lys Gin Pro Leu Ala Leu Ala Ser Gly 

435 440 445 

Thr Leu Ala Leu Lys Gly Asn Val Glu Leu Asp Val Asn Gly Phe Thr 

450 455 460 

Gin Thr Glu Gly Ser Thr Leu Leu Met Gin Pro Gly Thr Lys Leu Lys 
465 470 475 480 

Ala Asp Thr Glu Ala He Ser Leu Thr Lys Leu Val Val Asp Leu Ser 

485 490 495 

Ala Leu Glu Gly Asn Lys Ser Val Ser He Glu Thr Ala Gly Ala Asn 

500 505 510 

Lys Thr He Thr Leu Thr Ser Pro Leu Val Phe Gin Asp Ser Ser Gly 

515 520 525 

Asn Phe Tyr Glu Ser His Thr He Asn Gin Ala Phe Thr Gin Pro Leu 

530 535 540 

Val Val Phe Thr Ala Ala Thr Ala Ala Ser Asp He Tyr He Asp Ala 
545 550 555 560 

Leu Leu Thr Ser Pro Val Gin Thr Pro Glu Pro His Tyr Gly Tyr Gin 

565 570 575 

Gly His Trp Glu Ala Thr Trp Ala Asp Thr Ser Thr Ala Lys Ser Gly 

580 585 590 

Thr Met Thr Trp Val Thr Thr Gly Tyr Asn Pro Asn Pro Glu Arg Arg 

595 600 605 

Ala Ser Val Val Pro Asp Ser Leu Trp Ala Ser Phe Thr Asp He Arg 

610 615 620 

Thr Leu Gin Gin He Met Thr Ser Gin Ala Asn Ser lie Tyr Gin Gin 
625 630 635 640 

Arg Gly Leu Trp Ala Ser Gly Thr Ala Asn Phe Phe His Lys Asp Lys 

645 650 655 

Ser Gly Thr Asn Gin Ala Phe Arg His Lys Ser Tyr Gly Tyr He Val 

660 665 670 

Gly Gly Ser Ala Glu Asp Phe Ser Glu Asn He Phe Ser Val Ala Phe 

675 680 685 

Cys Gin Leu Phe Gly Lys Asp Lys Asp Leu Phe He Val Glu Asn Thr 

690 695 700 

Ser His Asn Tyr Leu Ala Ser Leu Tyr Leu Gin His Arg Ala Phe Leu 
705 710 715 720 

Gly Gly Leu Pro Met Pro Ser Phe Gly Ser He Thr Asp Met Leu Lys 

725 730 735 

Asp He Pro Leu He Leu Asn Ala Gin Leu Ser Tyr Ser Tyr Thr Lys 
740 745 750 
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Asn Asp Met Asp Thr Arg Tyr Thr Ser Tyr Pro Glu Ala Gin Gly Ser 

755 760 765 

Trp Thr Asn Asn Ser Gly Ala Leu Glu Leu Gly Gly Ser Leu Ala Leu 

770 775 780 

Tyr Leu Pro Lys Glu Ala Pro Phe Phe Gin Gly Tyr Phe Pro Phe Leu 
785 790 795 800 

Lys Phe Gin Ala Val Tyr Ser Arg Gin Gin Asn Phe Lys Glu Ser Gly 

805 810 815 

Ala Glu Ala Arg Ala Phe Asp Asp Gly Asp Leu Val Asn Cys Ser He 

820 825 830 

Pro Val Gly He Arg Leu Glu Lys He Ser Glu Asp Glu Lys Asn Asn 

835 840 845 

Phe Glu He Ser Leu Ala Asn He Gly Asp Val Tyr Arg Lys Asn Pro 

850 855 860 

Arg Ser Arg Thr Ser Leu Met Vai Ser Gly Ala Ser Trp Thr Ser Leu 
865 870 875 880 

Cys Lys Asn Leu Ala Arg Gin Ala Phe Leu Ala Ser Ala Gly Ser His 

885 890 895 

Leu Thr Leu Ser Pro His Val Glu Leu Ser Gly Glu Ala Ala Tyr Glu 

900 905 910 

Leu Arg Gly Ser Ala His He Tyr Asn Val Asp Cys Gly Leu Arg Tyr 
915 920 925 

Ser Phe 
930 



(2) INFORMATION TOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 840 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



GAAGACAATA TAAGGTACCG TCATAACAGC GGGGGTTATG CACTAGGGAT CACAGCAACA 60 

ACTCCTGCCG AGGATCAGCT TACTTTTGCC TTCTGCCAGC TCTTTGCTAG AGATCGCAAT 120 

CATATTACAG GTAAGAACCA CGGAGATACT TACGGTGCCT CTTTGTATTT CCACCATACA 180 

GAAGGGCTCT TCGACATCGC CAATTTCCTC TGGGGAAAAG CAACCCGAGC TCCCTGGGTG 240 

CTCTCTGAGA TCTCCCAGAT CATTCCTTTA TCGTTCGATG CTAAATTCAG TTATCTCCAT 300 

ACAGACAACC ACATGAAGAC ATATTATACC GATAACTCTA TCATCAAGGG TTCTTGGAGA 360 

AACGATGCCT TCTGTGCAGA TCTTGGAGCT AGCCTGCCTT TTGTTATTTC CGTTCCGTAT 420 

CTTCTGAAAG AAGTCGAACC TTTTGTCAAA GTACAGTATA TCTATGCGCA TCAGCAAGAC 480 

TTCTACGAGC GTCATGCTGA AGGACGCGCT TTCAATAAAA GCGAGCTTAT CAACGTAGAG 540 

ATTCCTATAG GCGTCACCTT CGAAAGAGAC TCAAAATCAG AAAAGGGAAC TTACGATCTT 600 

ACTCTTATGT ATATACTCGA TGCTTACCGA CGCAATCCTA AATGTCAAAC TTCCCTAATA 660 

GCTAGCGATG CTAACTGGAT GGCCTATGGT ACCAACCTCG CACGACAAGG TTTTTCTGTT 720 

CGTGCTGCGA ACCATTTCCA AGTGAACCCC CACATGGAAA TCTTCGGTCA ATTCGCTTTT 780 

GAAGTACGAA GTTCTTCACG AAATTATAAT ACAAACCTAG GCTCTAAGTT TTGTTTCTAG 840 

(2) INFORMATION FOR SEQ ID NO: 18: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



Glu 


Asp 


Asn 


He 


Arg 


Tyr 


Arg 


His 


Asn 


Ser 


Gly Gly 


Tyr Ala Leu Gly 


1 








5 










10 




15 


lie 


Thx 


Ala 


Thr 
20 


Thr 


Jr J.LJ 


MJ. cl 




Asp 
25 


bin 


Leu Thr 


Phe Ala Phe Cys 
30 


Gin 


Leu 


Phe 
35 


Ala 


Arg 


Asp 


Arg 


Asn 


His 


lie 


Thr Gly 


Lys Asn His Gly 
45 


Asp 


Thr 


Tyr Gly 


•M.Xcl 


Ser 


Leu 


Tyr 


Pne 


His 


His Thr 


Glu Gly Leu Phe 




cn 


















60 


Asp 


He 


Ala 


Asn 


Phe 


Leu 


Trp 


Gly 


Lys 


Ala 


Thr Arg 


Ala Pro Trp Val 


D J 










/u 










75 


80 


Leu 


Ser 


Glu 


He 


Ser 

0 c 

OD 


Gin 


lie 


lie 


Pro 


Leu 
90 


Ser Phe 


Asp Ala Lys Phe 
95 


OCI 


Tyr 


Leu 


His 
100 


inr 


Asp 


Asn 


His 


Met 
105 


Lys 


Thr Tyr 


Tyr Thr Asp Asn 
110 


Cot- 
OCX 


lie 


He 
115 


Lys 


CjJLy 


Ser 


Trp 


Arg 
120 


Asn 


Asp 


Ala Phe 


Cys Ala Asp Leu 
125 


Gly 


Ala 


Ser 


Leu 


Pro 


Phe 


Val 


lie 


Ser 


Val 


Pro Tyr 


Leu Leu Lys Glu 














135 








140 


Val 


Glu 


Pro 


Phe 


Val 


Lys 


Val 


Gin 


Tyr 


lie 


Tyr Ala 


His Gin Gin Asp 


iij 










150 










155 


160 




Tyr 


Glu Arg 


HIS 


Aia 


Glu 


Gly 


Arg 


Ala 


Phe Asn 


Lys Ser Glu Leu 










165 










170 




175 


He 


Asn 


Val 


Glu 
180 


He 


Pro 


He 


Gly 


Val 
185 


Thr 


Phe Glu 


Arg Asp Ser Lys 
190 


Ser 


Glu 


Lys 
195 


Gly 


Thr 


Tyr 


Asp 


Leu 
200 


Thr 


Leu 


Met Tyr 


He Leu Asp Ala 
205 


Tyr 


Arg 


Arg Asn 


Pro 


Lys 


Cys 


Gin 


Thr 


Ser 


Leu lie 


Ala Ser Asp Ala 




210 










215 








220 


Asn 


Trp 


Met 


Ala 


Tyr 


Gly 


Thr 


Asn 


Leu 


Ala 


Arg Gin 


Gly Phe Ser Val 


225 










230 










235 


240 


Arg 


Ala 


Ala 


Asn 


His 
245 


Phe 


Gin 


Val 


Asn 


Pro 
250 


His Met 


Glu He Phe Gly 
255 


Gin 


Phe 


Ala 


Phe 
260 


Glu 


Val 


Arg 


Ser 


Ser 
265 


Ser 


Arg Asn 


Tyr Asn Thr Asn 
270 


Leu 


Gly 


Ser Lys 


Phe 


Cys 


Phe 













275 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1545 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ATGACCATAC TTCGAAATTT TCTTACCTGC TCGGCTTTAT TCCTCGCTCT CCCTGCAGCA 
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GCACAAGTTG TATATCTTCA TGAAAGTGAT GGTTATAACG 
TTAGAACCTA AAATTACCTG TTATCCAGAA GGAACTTCTT 
AGGATTTCCA ACGTTAAGCA TGATCAAGAA GATGCTGGGG 
AATCTTTTTT TCATGGGCAA CCGTTGCAAC TTCACTTTTC 
TTTGGCGCTG CCATTTCGAA CCGCGTTGGA GACACCACTC 
TACTTAACGT TCACCTCAGC ACCTCTACTA CCTCAAGGAC 
GGTTCCGTGA TGATCGAAAA TAGTGAGGAA GTGACTTTCT 
AGTGGAGCTG CGATTTATAC TCCCTACCTT TTAGGTTCTA 
AATCTCAGCG GGAACCGCTA CCTGGTGTTT AGAGACTATG 
GCCGTATCTA CCCACAATCT CACACTCACG ACTCGAGGAC 
CATGCTTATC ATGACGTGAA TAGTAATGGA GGAGCCATTG 
ATCTCTATAT CCGTGAAAAG CGGAGATCTC ATCTTCAAAG 
GGAAATACAA TACACAACTC CATCCATCTG CAATCTGGAG 
GCTGTTTCAG AATCCGGAGT TTATTTCTAT GATCCTATAA 
ATTACAGATC TTGTAATGAA -TGCTCCTGAA GGAAAGGAAA 
TTCTCAGGAC TATGCCTGGA TGATCATGAA GTTTGTGCGG 
CTACAAGATG TCACATTAGC AGGAGGAACT CTCTCTCTAT 
CTGCATTCTT TTAAGCAGGA AGCAAGCTCT ACGCTTACTA 
CTCTGCTCAG GAGATGCTCG GGTTCAGAAT CTGCACATCC 
TTTGTTCCTG TAAGGATTCG CGCCGAGGAC AAGGATGCTC 
AAAGTTGCCT TTGAGGCTTA TTGGTCCGTC TATGACTTTC 
ACGATTCCTC TTCTTGAACT TCTAGGGCCT TCTTTTGACA 
ACTTTGGAGA GAACCCAAGT CACAACAGAG AATGACGCCG 
AGCTGGGAAG AGTACCCCCC TTCTCTGGAT AAAGACAGAA 
ACTGTTTTCC TCACTTGGAA TCCTGAGATC ACTTCTACGC 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 514 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 



Met 


Thr 


He 


Leu 


Arg 


Asn 


Phe 


Leu 


Thr 


Cys 


Ser 


Ala 


Leu Phe Leu Ala 


1 








5 










10 






15 


Leu 


Pro 


Ala 


Ala 


Ala 


Gin 


Val 


Val 


Tyr 


Leu 


His 


Glu 


Ser Asp Gly Tyr 








20 










25 








30 


Asn 


Gly 


Ala 


He 


Asn 


Asn 


Lys 


Ser 


Leu 


Glu 


Pro 


Lys 


He Thr Cys Tyr 






35 










40 










45 


Pro 


Glu 


Gly 


Thr 


Ser 


Tyr 


He 


Phe 


Leu 


Asp 


Asp 


Val 


Arg He Ser Asn 




50 










55 










60 




Val 


Lys 


His 


Asp 


Gin 


Glu 


Asp 


Ala 


Gly 


Val 


Phe 


He 


Asn Arg Ser Gly 


65 










70 










75 




80 


Asn 


Leu 


Phe 


Phe 


Met 


Gly 


Asn 


Arg 


Cys 


Asn 


Phe 


Thr 


Phe His Asn Leu 










85 










90 






95 


Met 


Thr 


Glu 


Gly 


Phe 


Gly 


Ala 


Ala 


He 


Ser 


Asn 


Arg 


Val Gly Asp Thr 








100 










105 








110 


Thr 


Leu 


Thr 


Leu 


Ser 


Asn 


Phe 


Ser 


Tyr 


Leu 


Thr 


Phe 


Thr Ser Ala Pro 






115 










120 










125 


Leu 


Leu 


Pro 


Gin 


Gly 


Gin 


Gly 


Ala 


He 


Tyr 


Ser 


Leu 


Gly Ser Val Met 




130 










135 










140 




He 


Glu 


Asn 


Ser 


Glu 


Glu 


Val 


Thr 


Phe 


Cys 


Gly 


Asn 


Tyr Ser Ser Trp 
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GTGCTATCAA TAATAAAAGC 120 

ACATCTTTCT AGATGACGTG 180 

TTTTTATAAA TCGATCTGGG 240 

ACAACCTTAT GACCGAGGGT 300 

TCACTCTCTC TAATTTTTCT 360 

AAGGAGCGAT TTATAGTCTT 420 

GTGGGAACTA CTCTTCGTGG 480 

AGGCGAGTCG TCCTTCAGTA 540 

TGAGCCAAGG TTATGGCGGC 600 

CTTCGTGTTT TGAAAATAAT 660 

CCATTGCTCC TGGAGGATCG 720 

GAAATACAGC ATCACAAGAC 780 

CACAGTTTAA GAACCTACGT 840 

GCCATAGCGA GTCGCATAAA 900 

CTTATGAAGG AACAATTAGC 360 

AAAATCTTAC TTCCACAATC 1020 

CGGATGGGGT TACCTTGCAA 1080 

TGTCTCCAGG AACCACTCTG 1140 

TGATTGAAGA TACCGACAAC 1200 

TTGTCTCATT AGAAAAACTT 1260 

CTCAATTTAA GGAAGCCTTT 1320 

GTCTTCTCCT AGGGGAGACC 1380 

TTCGAGGTTT CTGGTCCCTA 1440 

GGATCACACC AACTAAGAAA 1500 

CATAA 1545 



WO 98/58953 



PCT/DK98/00266 



68 



145 




150 






155 






160 


Ser 


Gly 


Ala Ala He Tyr 


Thr Prr* TVrr 
iuj. riu i yi 


Leu Leu Gly Ser 


Lys Ala Ser 






165 




170 








1 /3 


Arg 


Pro 


Ser Val Asn Leu 


Ser Glv Acn 
vJiy noil 


Arg Tyr Leu Val 


Phe Arg Asp 






180 


185 












Tvx 


Val 


Ser Gin Gly Tyr 


GlV Glv Bla 
vjiy uiy mjlcI 


Val 


Ser 


Thr 


His 


Asn Leu Thr 






195 










205 




Leu 


Thr 


Thr Ara Glv Pro 


Qpy fSra DVta 

"Ci tys rue 


Glu 


Asn 


Asn 


His 


Ala Tyr His 




210 




215 






220 




Asp 


Val 


Asn Ser Aqn Gl \r 

f^wA-A kJUX JTxOA* V7J. y 


Gl \f 7\ 1 -a Tl « 

uiy Aid lie 


Ala 


lie 


Ala 


Pro 


Gly Gly Ser 


225 




230 






235 






240 


He 


Ser 


He Ser VaI T.ves 


oci oly Asp 


Leu 


lie 


Phe 


Lys 


Gly Asn Thr 






245 




250 








255 


Ala 


Ser 


Gin Asn Glv Aqti 


TVl T* Tl <a 14 -i a 
■1111 lie HIS 


Asn 


Ser 


lie 


His 


Leu Gin Ser 






-260- 












270 


Glv 


Ala 


-Lii lr lie Xjyo noil 


Leu Arg Ala 


Val 


Ser 


Glu 


Ser 


Gly Val Tyr 






275 










285 


Phe 


Tvr 
xyi 


-rv»3^» fri.U lie DC! 


nis ber blu 


Ser 


His 


Lys 


he 


Thr Asp Leu 




290 




*? Q ^ 






300 




Val 


11C 


noli Hid riO VjIU 


u±y Lys Glu 


Thr Tyr Glu Gly 


Thr lie Ser 


305 




JlU 






315 






320 


Phe 


Ser 


wiy Lieu v*ys Lieu 


Asp Asp His 


Glu 


Val 


Cys Ala 


Glu Asn Leu 










330 








335 


Thr 


Ser 


TVit- Tl O Ton Pin 

nil iic iicu bin 


Asp Val Thr 


Leu Ala Gly Gly 


Thr Leu Ser 






340 


"J A C 










350 


Leu 


Ser 


ton Gl vr Val TViv 
r\c*^J uiy veil 1 ill 


Leu Gin Leu 


His 


Ser 


Phe 


Lys 


Gin Glu Ala 






355 


jOU 








365 






<5er 

OCl 


nil liCU 1 111 1*16 u 


Ser Pro Gly 


Thr 


Thr 


Leu 


Leu 


Cys Ser Gly 




370 










380 




Asp 


Ala 


*vLy Vol uin ASIl 


iieu His ue 


Leu 


He Glu Asp 


Thr Asp Asn 


385 










395 






400 


Phe 


val 


Dy-/-\ \7a 1 7V -r-/T T 1 a 
riU Val ATy lie 


Arg Ala Glu 


Asp Lys Asp Ala 


Leu Val Ser 






405 




410 








415 


Leu 


Glu 


T ,vq T.en T.uo A/a 1 
uy d iicu iijro «ai 


Aia pne (jiu 


Ala 


Tyr Trp 


Ser 


Val Tyr Asp 






420 












430 


Phe 


Pro 


Gin Phi** T.vq Glu 
win t iic iijro uiu 


Ala DViek Thv 

Aia file inr 


He 


Pro 


Leu 


Leu 


Glu Leu Leu 






435 










445 




Glv 


Pro 


Ser Phe A*?n Qp»-r 


ijcu ijeu lieu 


Gly Glu Thr Thr 


Leu Glu Arg 




450 




455 






460 




Thr 


Gin 


Val Thr Thr Glu 


Asn Asp Ala 


Val 


Arg Gly Phe 


Trp Ser Leu 


465 




470 






475 






480 


Ser 


Trp 


Glu Glu Tyr Pro 


Pro Ser Leu 


Asp Lys Asp Arg 


Arg He Thr 






485 




490 








495 


Pro 


Thr 


Lys Lys Thr Val 


Phe Leu Thr 


Trp Asn 


Pro 


Glu 


lie Thr Ser 






500 


505 










510 



Thr Pro 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 787 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

ATGAAAACGT CTATTCGTAA GTTCTTAATT TCTACCACAC TGGCGCCATG TTTTGCTTCA 60 

ACAGCGTTTA CTGTAGAAGT TATCATGCCT TCCGAGAACT TTGATGGATC GAGTGGGAAG 120 

ATTTTTCCTT ACACAACACT TTCTGATCCT AGAGGGACAC TCTGTATTTT TTCAGGGGAT 180 

CTCTACATTG CGAATCTTGA TAATGCCATA TCCAGAACCT CTTCCAGTTG CTTTAGCAAT 240 

AGGGCGGGAG CACTACAAAT CTTAGGAAAA GGTGGGGTTT TCTCCTTCTT AAATATCCGT 300 

TCTTCAGCTG ACGGAGCCGC GATTAGTAGT GTAATCACCC AAAATCCTGA ACTATGTCCC 360 

TTGAGTTTTT CAGGATTTAG TCAGATGATC TTCGATAACT GTGAATCTTT GACTTCAGAT 420 

ACCTCAGCGA GTAATGTCAT ACCTCACGCA TCGGCGATTT ACGCTACAAC GCCCATGCTC 480 

TTTACAAACA ATGACTCCAT ACTATTCCAA TACAACCGTT CTGCAGGATT TGGAGCTGCC 540 

ATTCGAGGCA CAAGCATCAC AATAGAAAAT ACGAAAAAGA GCCTTCTCTT TAATGGTAAT 600 

GGATCCATCT CTAATGGAGG GGCCCTCACG GGATCTGCAG CGATCAACCT CATCAACAAT 660 

AGCGCTCCTG TGATTTTCTC AACGAATGCT ACAGGGATCT ATGGTGGGGC TATTTACCTT 720 

AGGGGAGGAT CTATGCTGAG GTGTGGGAAG GTGTGAGGAG -TCTTGTTGGT TTATAATAGG 780 

TCGCGCT 787 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 262 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



Met 


Lys 


Thr 


Ser 


He 


Arg 


Lys 


Phe 


Leu 


lie 


Ser Thr Thr Leu Ala Pro 


1 








5 










10 


15 


Cys 


Phe 


Ala 


Ser 


Thr 


Ala 


Phe 


Thr 


val 


Glu 


Val He Met Pro Ser Glu 








20 










25 




30 


Asn 


Phe 


Asp 


Gly 


Ser 


Ser 


Gly 


Lys 


He 


Phe 


Pro Tyr Thr Thr Leu Ser 






35 










40 






45 


Asp 


Pro 


Arg 


Gly 


Thr 


Leu 


Cys 


He 


Phe 


Ser 


Gly Asp Leu Tyr He Ala 




50 










55 








60 


Asn 


Leu 


Asp 


Asn 


Ala 


He 


Ser 


Arg 


Thr 


Ser 


Ser Ser Cys Phe Ser Asn 


65 










70 










75 80 


Arg 


Ala 


Gly 


Ala 


Leu 


Gin 


He 


Leu 


Gly 


Lys 


Gly Gly Val Phe Ser Phe 










85 










90 


95 


Leu 


Asn 


He 


Arg 


Ser 


Ser 


Ala 


Asp 


Gly 


Ala 


Ala He Ser Ser Val He 








100 










105 




110 


Thr 


Gin 


Asn 


Pro 


Glu 


Leu 


Cys 


Pro 


Leu 


Ser 


Phe Ser Gly Phe Ser Gin 






115 










120 






125 


Met 


He 


Phe 


Asp 


Asn 


Cys 


Glu 


Ser 


Leu 


Thr 


Ser Asp Thr Ser Ala Ser 




130 










135 








140 


Asn 


Val 


He 


Pro 


His 


Ala 


Ser 


Ala 


He 


Tyr 


Ala Thr Thr Pro Met Leu 


145 










150 










155 160 


Phe 


Thr 


Asn 


Asn 


Asp 


Ser 


He 


Leu 


Phe 


Gin 


Tyr Asn Arg Ser Ala Gly 










165 










170 


175 


Phe 


Gly 


Ala 


Ala 


He 


Arg 


Gly 


Thr 


Ser 


He 


Thr lie Glu Asn Thr Lys 








180 










185 




190 


Lys 


Ser 


Leu 


Leu 


Phe 


Asn 


Gly 


Asn 


Gly 


Ser 


He Ser Asn Gly Gly Ala 






195 










200 






205 


Leu 


Thr 


Gly 


Ser 


Ala 


Ala 


He 


Asn 


Leu 


He 


Asn Asn Ser Ala Pro Val 




210 










215 








220 
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He Phe Ser Thr Asn Ala Thr Gly He Tyr Gly Gly Ala He Tyr Leu 
225 230 235 240 

Thr Gly Gly Ser Met Leu Thr Ser Gly Asn Leu Ser Gly Val Leu Phe 

245 250 255 

Val Tyr Asn Ser Ser Arg 
260 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2838 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23 : 

ATGAAGACTT CAGTTTCTAT GTTGTTGGCC CTGCTTTGCT CGGGGGCTAG CTCTATTGTA 60 

CTCC ATGCC G CAACCACTCC ACTAAATCCT GAAGATGGGT TTATTGGGGA GGGCAATACA 120 

AATACTTTTT CTCCGAAATC TACAACGGAT GCTGCAGGAA CTACCTACTC TCTCACAGGA 180 

GAGGTTCTGT TTATAGATCC GGGGAAAGGT GGTTCAATTA CAGGAACTTG CTTTGTAGAA 240 

ACTGCTGGCG ATCTTACATT TTTAGGTAAT GGAAATACCC TAAAGTTCCT GTCGGTAGAT 300 

GCAGGTGCTA ATATCGCGGT TGCTCATGTA CAAGGAAGTA AGAATTTAAG CTTCACAGAT 360 

TTCCTTTCTC TGGTGATCAC AGAATCTCCA AAATCCGCTG TTAGTACAGG AAAAGGTAGC 420 

CTAGTCAGTT CAGGTGCAGT CCAACTGCAA GATATAAACA CTCTAGTTCT TACAAGCAAT 480 

GCCTCTGTCG AAGATGGTGG CGTGATTAAA GGAAACTCCT GCTTGATTCA GGGAATCAAA 540 

AATAGTGCGA TTTTTGGACA AAATACATCT TCGAAAAAAG GAGGGGCGAT CTCCACGACT 600 

CAAGGACTCA CCATAGAGAA TAACTTAGGG ACGCTAAAGT TCAATGAAAA CAAAGCAGTG 660 

ACCTCAGGAG GCGCCTTAGA TTTAGGAGCC GCGTCTACAT TCACTGCGAA CCATGAGTTG 720 

ATATTTTCAC AAAATAAGAC TTCTGGGAAT GCTGCAAATG GCGGAGCCAT AAATTGCTCA 780 

GGCGACCTAA CATTTACTGA TAACACTTCT TTGTTACTTC AAGAAAATAG CACAATGCAG 840 

GATGGTGGAG CTTTGTGTAG CACAGGAACC ATAAGCATTA CCGGTAGTGA TTCTATCAAT 900 

GTGATAGGAA ATACTTCAGG ACAAAAAGGA GGAGCGATTT CTGCAGCTTC TCTCAAGATT 960 

TTGGGAGGGC AGGGAGGCGC TCTCTTTTCT AATAACGTAG TGACTCATGC CACCCCTCTA 1020 

GGAGGTGCCA TTTTTATCAA CACAGGAGGA TCCTTGCAGC TCTTCACTCA AGGAGGGGAT 1080 

ATCGTATTCG AGGGGAATCA GGTCACTACA ACAGCTCCAA ATGCTACCAC TAAGAGAAAT 1140 

GTAATTCACC TCGAGAGCAC CGCGAAGTGG ACGGGACTTG CTGCAAGTCA AGGTAACGCT 1200 

ATCTATTTCT ATGATCCCAT TACCACCAAC GATACGGGAG CAAGCGATAA CTTACGTATC 1260 

AATGAGGTCA GTGCAAATCA AAAGCTCTCG GGATCTATAG TATTTTCTGG AGAGAGATTG 1320 

TCGACAGCAG AAGCTATAGC TGAAAATCTT ACTTCGAGGA TCAACCAGCC TGTCACTTTA 1380 

GTAGAGGGGA GCTTAGAACT TAAACAGGGA GTGACCTTGA TCACACAAGG ATTCTCGCAG 1440 

GAGCCAGAAT CCACGCTTCT TTTGGATTTG GGGACCTCAT TACAAGCTTC TACAGAAGAT 1500 

ATCGTCATCA CAAATTCATC TATAAATGCC GATACCATTT ACGGAAAGAA TCCAATCAAT 1560 

ATTGTAGCTT CAGCA GCGAA TAAGAACATT ACCCTAACAG GAACCTTAGC ACTTGTAAAT 1620 

GCAGATGGAG CTTTGTATGA GAACCATACC TTGCAAGACT CTCAAGATTA TAGCTTTGTA 1680 

AAGTTATCTC CAGGAGCGGG AGGGACTATA ATTACTCAAG ATGCTTCTCA GAAGCTTCTT 1740 

GAAGTAGCTC CTTCTAGACC ACATTATGGC TATCAAGGAC ATTGGAATGT GCAAGTCATC 1800 

CCAGGAACGG GAACTCAACC GAGCCAGGCA AATTTAGAAT GGGTGOGGAC AGGATACCTT 1860 

CCGAATCCCG AACGGCAAGG ATTTTTAGTT CCCAATAGCC TGTGGGGTTC TTTTGTTGAT 1920 

CAGCGTGCTA TCCAAGAAAT CATGGTAAAT AGTAGCCAAA TCTTATGTCA GGAACGGGGA 1980 

GTCTGGGGAG CTGGAATTGC TAATTTCCTA CATAGAGATA AAATTAATGA GCACGGCTAT 2040 

CGCCATAGCG GTGTCGG TTA TCTTGTGGGA GTTGGCACTC ATGCTTTTTC TGATGCTACG 2100 

ATAAATGCGG CTTTTTGCCA GCTCTTCAGT AGAGATAAAG ACTACGTAGT ATCCAAAAAT 2160 

CATGGAACTA GCTACTCAGG GGTCGTATTT CTTGAGGATA CCCTAGAGTT TAGAAGTCCA 2220 

CAGGGATTCT ATACTGATAG CTCCTCAGAA GCTTGCTGTA ACCAAGTCGT CACTATAGAT 2280 
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ATGCAGTTGT CTTACAGCCA TAGAAATAAT GATATGAAAA CCAAATACAC GACATATCCA 2340 

GAAGCTCAGG GATCTTGGGC AAATGATGTT TTTGGTCTTG AGTTTGGAGC GACTACATAC 2400 

TACTACCCTA ACAGTACTTT TTTATTTGAT TACTACTCTC CGTTTCTCAG GCTGCAGTGC 2460 

ACCTATGCTC ACCAGGAAGA CTTCAAAGAG ACAGGAGGTG AGGTTCGTCA CTTTACTAGC 2520 

GGAGATCTTT TCAATTTAGC AGTTCCTATT GGCGTGAAGT TTGAGAGATT TTCAGACTGT 2580 

AAAAGGGGAT CTTATGAACT TACCCTTGCT TATGTTCCTG ATGTGATTCG CAAAGATCCC 2640 

AAGAGCACGG CAACATTGGC TAGTGGAGCT ACGTGGAGCA CCCACGGAAA CAATCTCTCC 2700 

AGACAAGGAT TACAACTGCG TTTAGGGAAC CACTGTCTCA TAAATCCTGG AATTGAGGTG 2760 

TTCAGTCACG GAGCTATTGA ATTGCGGGGA TCCTCTCGTA ATTATAACAT CAATCTCGGG 2 820 

GGTAAATACC GATTTTAA 2838 

(2) INFORMATION FOR SEQ ID NO:24: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 946 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



Met 


Lys 


Thr 


Ser 


Val 


Ser 


Met 


Leu 


Leu 


Ala 


Leu 


Leu 


Cys 


Ser Gly Ala 


1 








5 










10 








15 


Ser 


Ser 


lie 


Val 
20 


Leu 


His 


Ala 


Ala 


Thr 
25 


Thr 


Pro 


Leu 


Asn 


Pro Glu Asp 
30 


Gly 


Phe 


He 


Gly 


Glu 


Gly 


Asn 


Thr 


Asn 


Thr 


Phe 


Ser 


Pro 


Lys Ser Thr 






35 










40 










45 


Thr 


Asp 


Ala 


Ala 


Gly 


Thr 


Thr 


Tyr 


Ser 


Leu 


Thr 


Gly Glu Val Leu Phe 




50 










55 










60 






He 


Asp 


Pro 


Gly 


Lys 


Gly 


Gly 


Ser 


He 


Thr 


Gly 


Thr 


Cys 


Phe Val Glu 


65 










70 










75 




80 


Thr 


Ala 


Gly 


Asp 


Leu 
85 


Thr 


Phe 


Leu 


Gly 


Asn 
90 


Gly 


Asn 


Thr 


Leu Lys Phe 
95 


Leu 


Ser 


Val 


Asp 
100 


Ala 


Gly 


Ala 


Asn 


He 
105 


Ala 


Val 


Ala 


His 


Val Gin Gly 
110 


Ser 


Lys 


Asn 
115 


Leu 


Ser 


Phe 


Thr 


Asp 
120 


Phe 


Leu 


Ser 


Leu 


Val 
125 


He Thr Glu 


Ser 


Pro 
130 


Lys 


Ser 


Ala 


Val 


Ser 
135 


Thr 


Gly 


Lys 


Gly 


Ser 
140 


Leu 


Val Ser Ser 


Gly 


Ala 


Val 


Gin 


Leu 


Gin 


Asp 


He 


Asn 


Thr 


Leu 


Val 


Leu 


Thr Ser Asn 


145 










150 










155 






160 


Ala 


Ser 


Val 


Glu 


Asp 
165 


Gly 


Gly 


Val 


He 


Lys 
170 


Gly 


Asn 


Ser 


Cys Leu He 
175 


Gin 


Gly 


He 


Lys 
180 


Asn 


Ser 


Ala 


He 


Phe 
185 


Gly 


Gin 


Asn 


Thr 


Ser Ser Lys 
190 


Lys 


Gly 


Gly 
195 


Ala 


He 


Ser 


Thr 


Thr 
200 


Gin 


Gly 


Leu 


Thr 


He 
205 


Glu Asn Asn 


Leu 


Gly 


Thr 


Leu 


Lys 


Phe 


Asn 


Glu 


Asn 


Lys 


Ala 


Val 


Thr 


Ser Gly Gly 




210 










215 










220 




Ala 


Leu 


Asp 


Leu 


Gly 


Ala 


Ala 


Ser 


Thr 


Phe 


Thr 


Ala 


Asn 


His Glu Leu 


225 










230 










235 






240 


He 


Phe 


Ser 


Gin 


Asn 


Lys 


Thr 


Ser 


Gly 


Asn 


Ala 


Ala Asn Gly Gly Ala 










245 










250 








255 


He 


Asn 


Cys 


Ser 
260 


Gly 


Asp 


Leu 


Thr 


Phe 
265 


Thr 


Asp 


Asn 


Thr 


Ser Leu Leu 
270 
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Leu Gin 


Glu 


Asn Ser Thr 


Met 


Gin 


Asp 




275 






280 


Gly Thr 


He 


Ser He Thr 


Gly Ser Asp 


290 






295 






Thr Ser 


Gly 


Gin Lvs Glv 


Gly Ala 


He 


305 




310 
j i \f 








Leu Gly 


Glv 


Gin Glv Glv 


Ala 


Leu 


Phe 






325 








Ala Thr 


Pro 


jljcu uiy vjiy 


Ala 


He 


Phe 






340 






345 


Gin Leu 


Phe 


Thr Gin Rlv 
iuii viu uiy 


Gly Asp He 




355 






360 




Thr Thr 


Thr 


Al A Pt~o 2i on 
r IV noil 


Ala 


Thr 


Thr 


370 






375 






Glu Ser 


Thr 


7\ T — T . 




_ _ 








"AO uyo XJL£J 


1111 


etxy 


Leu 


385 












He Tyr 


Phe 


THjt* Bon T}-v-/-\ 
J. y x rxoy fro 


He 


Thr 


Thr 






™ w J 








"-Oil IjCU 


Arg 


xxe Asn ij-LU 


Val 


Ser 


Ala 






420 






425 


He Val 


true 


oci ijj.y uxu 


Arg 


Leu 


Ser 




435 






440 




Asn Leu 


Thr 


Cpr> Arvr Tip 
ocx. avjl y lie 


Asn 


Gin 


Pro 


450 






455 






L*»u Rln 


Leu 


ijyo uxu \jxy 


Val 


Thr 


Leu 


465 




a in 








Glu Pro 


Glu 


ocx nix iteu 


Leu 


Leu 


Asp 






too 








Ser Thr 


fil ii 

VJX KX 


/isp lie Vdl 


He 


Thr 


Asn 






500 






505 


iic l y i 


uiy 


jjys Hsn fro 


He 


Asn 


He 




515 






520 




Asn He 


Thr 


ucu x ill. uiy 


Thr 


Leu 


Ala 


530 






535 






Leu Tyr 


Glu 


ns^XA nib X ill 


Leu 


Gin Asp 


545 




ccn 

J JU 








Lvs Lf*n 


Ser 


riu «j JL y Hid 


Gly Gly Thr 






ODD 








Gin Lys 


Leu 


T.ei-n flln Val 
ucu uiu Val 


Ala 


Pro 


Ser 






580 






585 


Gly His 




Asn Val filn 

noil vai Ul.ll 


Val 


He 


Pro 




595 






600 




Gin Ala 


Asn 


Leu Glu Tm 


Val 


Arg Thr 


610 






615 






Arg Gin 


Glv 

viiy 


Phe* Lpu Vfl 1 

t 1JCU V CL1 


Pro 


Asn 


Ser 


625 




630 








Gin Arg 


Ala 


He Gin Glu 


He 


Met 


Val 






645 








Gin Glu 


Arcr 


Glv Val Tm 


Gly Ala 


Gly 






660 






665 


Asp Lys 


He 


Asn Glu His 


Gly Tyr Arg 




675 






680 




Val Gly 


Val 


Gly Thr His 


Ala 


Phe 


Ser 


690 






695 






Phe Cys 


Gin 


Leu Phe Ser 


Arg Asp 


Lys 


705 




710 








His Gly 


Thr 


Ser Tyr Ser 


Gly Val 


Val 
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Gly 


oxy aid iieu ^.ys oer 


inr 








OCI 


xJ-e Asn vai lie uiy 


Asn 








Car 

OCX 


Aia Aia t>er lieu Lys 


He 




jIj 


320 


Ser 


Asn Asn Val Val Thr 


His 




335 




xxe 


Asn Thr Gly Gly Ser 


Leu 




OCA 

350 




vai 


Phe Glu Gly Asn Gin 


Val 








Lys 


Arg Asn Val lie His 


Leu 




380 




Axa 


Ala Ser Gin Gly Asn 


Ala 




1 AC 

395 


400 


Asn 


Asp Thr Gly Ala Ser 


Asp 


410 


415 




Asn 


Gin Lys Leu Ser Gly 


Ser 




430 




Thr 


Ala Glu Ala He Ala 


Glu 




445 




Val 


Thr Leu Val Glu Gly 


Ser 




460 




lie 


Thr Gin Gly Phe Ser 


Gin 




475 


480 


Leu 


Gly Thr Ser Leu Gin 


Ala 


490 


495 




Ser 


Ser He Asn Ala Asp 


Thr 




510 




-IT— "1 

val 


Ala Ser Ala Ala Asn 


Lys 




525 




Leu 


Val Asn Ala Asp Gly 


Ala 




540 




ber 


Gin Asp Tyr Ser Phe 


Val 




555 


560 


Tl - 

lie 


He Thr Gin Asp Ala 


Ser 


D / U 


575 




Arg 


Pro His Tyr Gly Tyr 


Gin 




590 




r*l \r 


Thr Gly Thr Gin Pro 


Ser 




605 




lily 


Tyr Leu Pro Asn Pro 


Glu 








Leu 


xrp oiy Ser Pne Val 


Asp 






640 


Asn 


oer faer Gin lie Leu 


Cys 


DDU 


655 




Tl a. 

xxe 


Ala Asn Phe Leu His 


Arg 




D /0 




His 


Ser Gly Val Gly Tyr 


Leu 




685 




Asp 


Ala Thr He Asn Ala 


Ala 




700 




Asp 


Tyr Val Val Ser Lys 


Asn 




715 


720 


Phe 


Leu Glu Asp Thr Leu 


Glu 
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725 730 735 

Phe Arg Ser Pro Gin Gly Phe Tyr Thr Asp Ser Ser Ser Glu Ala Cys 

740 745 750 

Cys Asn Gin Val Val Thr He Asp Met Gin Leu Ser Tyr Ser His Arg 

755 760 765 

Asn Asn Asp Met Lys Thr Lys Tyr Thr Thr Tyr Pro Glu Ala Gin Gly 

770 775 780 

Ser Trp Ala Asn Asp Val Phe Gly Leu Glu Phe Gly Ala Thr Thr Tyr 
785 790 795 800 

Tyr Tyr Pro Asn Ser Thr Phe Leu Phe Asp Tyr Tyr Ser Pro Phe Leu 

805 810 815 

Arg Leu Gin Cys Thr Tyr Ala His Gin Glu Asp Phe Lys Glu Thr Gly 

820 825 830 

Gly Glu Val Arg His Phe Thr Ser Gly Asp Leu Phe Asn Leu Ala Val 

835 840 845 

Pro He Gly Val Lys Phe Glu Arg Phe Ser Asp Cys Lys Arg Gly Ser 

850 855 860 

Tyr Glu Leu Thr Leu Ala Tyr Val Pro Asp Val He Arg Lys Asp Pro 
865 870 875 880 

Lys Ser Thr Ala Thr Leu Ala Ser Gly Ala Thr Trp Ser Thr His Gly 

885 890 895 

Asn Asn Leu Ser Arg Gin Gly Leu Gin Leu Arg Leu Gly Asn His Cys 

900 905 910 

Leu He Asn Pro Gly He Glu Val Phe Ser His Gly Ala He Glu Leu 

915 920 925 

Arg Gly Ser Ser Arg Asn Tyr Asn He Asn Leu Gly Gly Lys Tyr Arg 
930 935 940 

Phe 
945 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3000 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 259... 3000 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

ATCAGGTGAT AAAAGTTCCT CGTTAGCTAG TGACTGTAGG TGACATGAGA AAGCTAACAC 
GGAGGAAACT AAAACCCAAG GAATCGAAGT CTTCATGGTA ATGCTTTTGT TTTTTAGAGA 
ACTATTCGCA TCAATATAGA AACAAAATAA GTAAATCAAG TTAAAGATGA CAAAACAGCT 
GTCAAGAATT TTTATCTTGA CTCTCTGAGT TTTCTATTTT ATATGACGCA AGTAAGAATT 
TAATAATAAA GTGGGTTT ATG AAA TCG CAA TTT TCC TGG TTA GTG CTC TCT 

Met Lys Ser Gin Phe Ser Trp Leu Val Leu Ser 
1 5 10 
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TCG ACA TTG GCA TGT TTT ACT AGT TGT TCC ACT GTT TTT GCT GCA ACT 339 
Ser Thr Leu Ala Cys Phe Thr Ser Cys Ser Thr Val Phe Ala Ala Thr 
15 20 25 

GCT GAA AAT ATA GGC CCC TCT GAT AGC TTT GAC GGA AGT ACT AAC ACA 387 
Ala Glu Asn He Gly Pro Ser Asp Ser Phe Asp Gly Ser Thr Asn Thr 
30 35 40 

GGC ACC TAT ACT CCT AAA AAT ACG ACT ACT GGA ATA GAC TAT ACT CTG 435 
Gly Thr Tyr Thr Pro Lys Asn Thr Thr Thr Gly He Asp Tyr Thr Leu 
45 50 55 

ACA GGA GAT ATA ACT CTG CAA AAC CTT GGG GAT TCG GCA GCT TTA ACG 483 
Thr Gly Asp He Thr Leu Gin Asn Leu Gly Asp Ser Ala Ala Leu Thr 
60 65 TG 75 

AAG GGT TGT TTT TCT GAC ACT ACG GAA TCT TTA AGC TTT GCC GGT AAG 531 
Lys Gly Cys Phe Ser Asp Thr Thr Glu Ser Leu Ser Phe Ala Gly Lys 
80 85 90 

GGG TAC TCA CTT TCT TTT TTA AAT ATT AAG TCT AGT GCT GAA GGC GCA 579 
Gly Tyr Ser Leu Ser Phe Leu Asn He Lys Ser Ser Ala Glu Gly Ala 
95 100 105 

GCA CTT TCT GTT ACA ACT GAT AAA AAT CTG TCG CTA ACA GGA TTT TCG 627 
Ala Leu Ser Val Thr Thr Asp Lys Asn Leu Ser Leu Thr Gly Phe Ser 
HO lis 120 

AGT CTT ACT TTC TTA GCG GCC CCA TCA TCG GTA ATC ACA ACC CCC TCA 675 
Ser Leu Thr Phe Leu Ala Ala Pro Ser Ser Val He Thr Thr Pro Ser 
125 130 135 

GGA AAA GGT GCA GTT AAA TGT GGA GGG GAT CTT ACA TTT GAT AAC AAT 723 
Gly Lys Gly Ala Val Lys Cys Gly Gly Asp Leu Thr Phe Asp Asn Asn 
140 145 150 155 

GGA ACT ATT TTA TTT AAA CAA GAT TAC TGT GAG GAA AAT GGC GGA GCC 771 
Gly Thr He Leu Phe Lys Gin Asp Tyr Cys Glu Glu Asn Gly Gly Ala 
160 165 170 

ATT TCT ACC AAG AAT CTT TCT TTG AAA AAC AGC ACG GGA TCG ATT TCT 819 
He Ser Thr Lys Asn Leu Ser Leu Lys Asn Ser Thr Gly Ser He Ser 
175 180 185 

TTT GAA GGG AAT AAA TCG AGC GCA ACA GGG AAA AAA GGT GGG GCT ATT 867 
Phe Glu Gly Asn Lys Ser Ser Ala Thr Gly Lys Lys Gly Gly Ala He 
190 195 200 

TGT GCT ACT GGT ACT GTA GAT ATT ACA AAT AAT ACG GCT CCT ACC CTC 915 
Cys Ala Thr Gly Thr Val Asp He Thr Asn Asn Thr Ala Pro Thr Leu 
205 210 215 

TTC TCG AAC AAT ATT GCT GAA GCT GCA GGT GGA GCT ATA AAT AGC ACA 963 
Phe Ser Asn Asn He Ala Glu Ala Ala Gly Gly Ala He Asn Ser Thr 
220 225 230 235 

GGA AAC TGT ACA ATT ACA GGG AAT ACG TCT CTT GTA TTT TCT GAA AAT 1011 
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Gly Asn Cys Thr He Thr Gly Asn Thr Ser Leu Val Phe Ser Glu Asn 
240 245 250 

AGT GTG ACA GCG ACC GCA GGA AAT GGA GGA GCT CTT TCT GGA GAT GCC 1059 
Ser Val Thr Ala Thr Ala Gly Asn Gly Gly Ala Leu Ser Gly Asp Ala 
255 260 265 

GAT GTT ACC ATA TCT GGG AAT CAG AGT GTA ACT TTC TCA GGA AAC CAA 1107 
Asp Val Thr He Ser Gly Asn Gin Ser Val Thr Phe Ser Gly Asn Gin 
270 275 280 

GCT GTA GCT AAT GGC GGA GCC ATT TAT GCT AAG AAG CTT ACA CTG GCT 1155 
Ala Val Ala Asn Gly Gly Ala He Tyr Ala Lys Lys Leu Thr Leu Ala 
285 290 295 

TCC GGG GGG GGG GGG GGT ATC TCC TTT TCT AAC AAT ATA GTC CAA GGT 1203 
Ser Gly Gly Gly Gly Gly He Ser Phe Ser Asn Asn He Val Gin Gly 

300 ^ 305 ~ ~~ 310 " 315 

ACC ACT GCA GGT AAT GGT GGA GCC ATT TCT ATA CTG GCA GCT GGA GAG 1251 
Thr Thr Ala Gly Asn Gly Gly Ala He Ser He Leu Ala Ala Gly Glu 
320 325 330 

TGT AGT CTT TCA GCA GAA GCA GGG GAC ATT ACC TTC AAT GGG AAT GCC 1299 
Cys Ser Leu Ser Ala Glu Ala Gly Asp He Thr Phe Asn Gly Asn Ala 
335 340 345 

ATT GTT GCA ACT ACA CCA CAA ACT ACA AAA AGA AAT TCT ATT GAC ATA 1347 
He Val Ala Thr Thr Pro Gin Thr Thr Lys Arg Asn Ser He Asp He 
350 355 360 

GGA TCT ACT GCA AAG ATC ACG AAT TTA CGT GCA ATA TCT GGG CAT AGC 1395 
Gly Ser Thr Ala Lys lie Thr Asn Leu Arg Ala He Ser Gly His Ser 
365 370 375 

ATC TTT TTC TAG GAT CCG ATT ACT GCT AAT ACG GCT GCG GAT TCT ACA 1443 
He Phe Phe Tyr Asp Pro He Thr Ala Asn Thr Ala Ala Asp Ser Thr 
380 385 390 395 

GAT ACT TTA AAT CTC AAT AAG GCT GAT GCA GGT AAT AGT ACA GAT TAT 1491 
Asp Thr Leu Asn Leu Asn Lys Ala Asp Ala Gly Asn Ser Thr Asp Tyr 
400 405 410 

AGT GGG TCG ATT GTT TTT TCT GGT GAA AAG CTC TCT GAA GAT GAA GCA 1539 
Ser Gly Ser He Val Phe Ser Gly Glu Lys Leu Ser Glu Asp Glu Ala 
415 420 425 

AAA GTT GCA GAC AAC CTC ACT TCT ACG CTG AAG CAG CCT GTA ACT CTA 1587 
Lys Val Ala Asp Asn Leu Thr Ser Thr Leu Lys Gin Pro Val Thr Leu 
430 435 440 

ACT GCA GGA AAT TTA GTA CTT AAA CGT GGT GTC ACT CTC GAT ACG AAA 1635 
Thr Ala Gly Asn Leu Val Leu Lys Arg Gly Val Thr Leu Asp Thr Lys 
445 450 455 

GGC TTT ACT CAG ACC GCG GGT TCC TCT GTT ATT ATG GAT GCG GGC ACA 1683 
Gly Phe Thr Gin Thr Ala Gly Ser Ser Val He Met Asp Ala Gly Thr 
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460 465 470 475 

ACG TTA AAA GCA AGT ACA GAG GAG GTC ACT TTA ACA GGT CTT TCC ATT 1731 
Thr Leu Lys Ala Ser Thr Glu Glu Val Thr Leu Thr Gly Leu Ser He 
480 485 490 

CCT GTA GAC TCT TTA GGC GAG GGT AAG AAA GTT GTA ATT GCT GCT TCT 1779 
Pro Val Asp Ser Leu Gly Glu Gly Lys Lys Val Val He Ala Ala Ser 
495 500 505 

GCA GCA AGT AAA AAT GTA GCC CTT AGT GGT CCG ATT CTT CTT TTG GAT 1827 
Ala Ala Ser Lys Asn Val Ala Leu Ser Gly Pro He Leu Leu Leu Asp 
510 515 520 

AAG GAA-GGG AAT -GGT TAT GAA-AAT CAC GAC TTA GGA "AAA ACT CAA GAC 1875 
Asn Gin Gly Asn Ala Tyr Glu Asn His Asp Leu Gly Lys Thr Gin Asp 
525 530^ 535 

TTT TCA TTT GTG CAG CTC TCT GCT CTG GGT ACT GCA ACA ACT ACA GAT 1923 
Phe Ser Phe Val Gin Leu Ser Ala Leu Gly Thr Ala Thr Thr Thr Asp 
540 545 550 555 

GTT CCA GCG GTT CCT ACA GTA GCA ACT CCT ACG CAC TAT GGG TAT CAA 1971 
Val Pro Ala Val Pro Thr Val Ala Thr Pro Thr His Tyr Gly Tyr Gin 
560 565 570 

GGT ACT TGG GGA ATG ACT TGG GTT GAT GAT ACC GCA AGC ACT CCA AAG 2019 
Gly Thr Trp Gly Met Thr Trp Val Asp Asp Thr Ala Ser Thr Pro Lys 
575 580 585 

ACT AAG ACA GCG ACA TTA GCT TGG ACC AAT ACA GGC TAG CTT CCG AAT 2067 
Thr Lys Thr Ala Thr Leu Ala Trp Thr Asn Thr Gly Tyr Leu Pro Asn 
590 595 600 

CCT GAG CGT CAA GGA CCT TTA GTT CCT AAT AGC CTT TGG GGA TCT TTT 2115 
Pro Glu Arg Gin Gly Pro Leu Val Pro Asn Ser Leu Trp Gly Ser Phe 
605 610 615 

TCA GAC ATC CAA GCG ATT CAA GGT GTC ATA GAG AGA AGT GCT TTG ACT 2163 
Ser Asp He Gin Ala He Gin Gly Val He Glu Arg Ser Ala Leu Thr 
620 625 630 635 

CTT TGT TCA GAT CGA GGC TTC TGG GCT GCG GGA GTC GCC AAT TTC TTA 2211 
Leu Cys Ser Asp Arg Gly Phe Trp Ala Ala Gly Val Ala Asn Phe Leu 
640 645 650 

GAT AAA GAT AAG AAA GGG GAA AAA CGC AAA TAC CGT CAT AAA TCT GGT 2259 
Asp Lys Asp Lys Lys Gly Glu Lys Arg Lys Tyr Arg His Lys Ser Gly 
655 660 665 

GGA TAT GCT ATC GGA GGT GCA GCG CAA ACT TGT TCT GAA AAC TTA ATT 2307 
Gly Tyr Ala He Gly Gly Ala Ala Gin Thr Cys Ser Glu Asn Leu He 
670 675 680 

AGC TTT GCC TTT TGC CAA CTC TTT GGT AGC GAT AAA GAT TTC TTA GTC 2355 
Ser Phe Ala Phe Cys Gin Leu Phe Gly Ser Asp Lys Asp Phe Leu Val 
685 690 695 
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GCT AAA AAT CAT ACT GAT ACC TAT GCA GGA GCC TTC TAT ATC CAA CAC 2403 
Ala Lys Asn His Thr Asp Thr Tyr Ala Gly Ala Phe Tyr lie Gin His 
700 705 710 715 

ATT ACA GAA TGT AGT GGG TTC ATA GGT TGT CTC TTA GAT AAA CTT CCT 2451 
He Thr Glu Cys Ser Gly Phe He Gly Cys Leu Leu Asp Lys Leu Pro 
720 725 730 

GGC TCT TGG AGT CAT AAA CCC CTC GTT TTA GAA GGG CAG CTC GCT TAT 2499 
Gly Ser Trp Ser His Lys Pro Leu Val Leu Glu Gly Gin Leu Ala Tyr 
735 740 745 

AGC CAC GTC AGT AAT GAT CTG AAG ACA AAG TAT ACT GCG TAT CCT GAG 2547 
Ser His Val Ser Asn Asp Leu Lys Thr Lys Tyr Thr Ala Tyr Pro Glu 

- /ou 

GTG AAA GGT TCT TGG GGG AAT AAT GCT TTT AAC ATG ATG TTG GGA GCT 2595 
Val Lys Gly Ser Trp Gly Asn Asn Ala Phe Asn Met Met Leu Gly Ala 
765 770 775 

TCT TCT CAT TCT TAT CCT GAA TAC CTG CAT TGT TTT GAT ACC TAT GCT 2643 
Ser Ser His Ser Tyr Pro Glu Tyr Leu His Cys Phe Asp Thr Tyr Ala 
780 785 790 795 

CCA TAC ATC AAA CTG AAT CTG ACC TAT ATA CGT CAG GAC AGC TTC TCG 2691 
Pro Tyr He Lys Leu Asn Leu Thr Tyr He Arg Gin Asp Ser Phe Ser 
800 805 810 

GAG AAA GGT ACA GAA GGA AGA TCT TTT GAT GAC AGC AAC CTC TTC AAT 2739 
Glu Lys Gly Thr Glu Gly Arg Ser Phe Asp Asp Ser Asn Leu Phe Asn 
815 820 825 

TTA TCT TTG CCT ATA GGG GTG AAG TTT GAG AAG TTC TCT GAT TGT AAT 2787 
Leu Ser Leu Pro He Gly Val Lys Phe Glu Lys Phe Ser Asp Cys Asn 
830 835 840 

GAC TTT TCT TAT GAT CTG ACT TTA TCC TAT GTT CCT GAT CTT ATC CGC 2835 
Asp Phe Ser Tyr Asp Leu Thr Leu Ser Tyr Val Pro Asp Leu He Arg 
845 850 855 

AAT GAT CCC AAA TGC ACT ACA GCA CTT GTA ATC AGC GGA GCC TCT TGG 2883 
Asn Asp Pro Lys Cys Thr Thr Ala Leu Val He Ser Gly Ala Ser Trp 
860 865 870 875 

GAA ACT TAT GCC AAT AAC TTA GCA CGA CAG GCC TTG CAA GTG CGT GCA 2931 
Glu Thr Tyr Ala Asn Asn Leu Ala Arg Gin Ala Leu Gin Val Arg Ala 
880 885 890 

GGC AGT CAC TAC GCC TTC TCT CCT ATG TTT GAA GTG CTC GGC CAG TTT 2979 
Gly Ser His Tyr Ala Phe Ser Pro Met Phe Glu Val Leu Gly Gin Phe 
895 900 905 

GTC TTT GAA GTT CGT GGA TCC 3000 
Val Phe Glu Val Arg Gly Ser 
910 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 914 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



Met Lys 


Ser 


Gin 


Phe 


Ser 


Trp 


Leu 


Val 


Leu 


Ser 


Ser Thr Leu 


Ala Cys 


1 






5 










10 






15 


Phe Thr 


Ser 


Cys 


Ser 


Thr 


Val 


Phe 


Ala 


Ala 


Thr 


Ala Glu Asn 


He Gly 






20 










25 






30 


Pro Ser 


Asp 


Ser 


Phe 


Asp 


Gly 


Ser 


Thr 


Asn 


Thr 


Gly Thr Tyr 


Thr Pro 




35 










40 








45 




Lys Asn 


Thr 


Thr 


Thr 


Gly 


He 


Asp 


Tyr 


Thr 


Leu 


Thr Gly Asp 


He Thr 


50 










55 










60 




Leu Gin 


Asn 


Leu 


Gly 


Asp 


Ser 


Ala 


Ala 


Leu 


Thr 


Lys Gly Cys 


Phe Ser 


65 








70 










75 


80 


Asp Thr 


Thr 


Glu 


Ser 


Leu 


Ser 


Phe 


Ala 


Gly 


Lys 


Gly Tyr Ser 


Leu Ser 








85 










90 






95 


Phe Leu 


Asn 


He 


Lys 


Ser 


Ser 


Ala 


Glu 


Gly 


Ala 


Ala Leu Ser 


Val Thr 






100 










105 






HO 




Thr Asp 


Lys 


Asn 


Leu 


Ser 


Leu 


Thr 


Gly 


Phe 


Ser 


Ser Leu Thr 


Phe Leu 




115 










120 








125 




Ala Ala 


Pro 


Ser 


Ser 


Val 


He 


Thr 


Thr 


Pro 


Ser 


Gly Lys Gly 


Ala Val 


130 










135 










140 




Lys Cys 


Gly 


Gly 


Asp 


Leu 


Thr 


Phe 


Asp 


Asn 


Asn 


Gly Thr He 


Leu Phe 


145 








150 










155 


160 


T ^ m /II- 

jjys Gin 


Asp 


Tyr 


Cys 


Glu 


Glu 


Asn 


Gly 


Gly 


Ala 


He Ser Thr 


Lys Asn 








165 










170 






175 


Leu Ser 


Leu 


Lys 


Asn 


Ser 


Thr 


Gly 


Ser 


He 


Ser 


Phe Glu Gly 


Asn Lys 






180 










185 






190 


Ser Ser 


Ala 


Thr 


Gly 


Lys 


Lys 


Gly 


Gly 


Ala 


He 


Cys Ala Thr 


Gly Thr 




195 










200 








205 


Val Asp 


He 


Thr 


Asn 


Asn 


Thr 


Ala 


Pro 


Thr 


Leu 


Phe Ser Asn 


Asn He 


210 










215 










220 




Ala Glu 


Ala 


Ala 


Gly 


Gly 


Ala 


He 


Asn 


Ser 


Thr 


Gly Asn Cys 


Thr He 


225 








230 










235 


240 


Thr Gly 


Asn 


Thr 


Ser 


Leu 


Val 


Phe 


Ser 


Glu 


Asn 


Ser Val Thr 


Ala Thr 








245 










250 






255 


Ala Gly 


Asn 


Gly 


Gly 


Ala 


Leu 


Ser 


Gly 


Asp 


Ala 


Asp Val Thr 


He Ser 






260 










265 






270 




Gly Asn 


Gin 


Ser 


Val 


Thr 


Phe 


Ser 


Gly 


Asn 


Gin 


Ala Val Ala 


Asn Gly 




275 










280 








285 


Gly Ala 


He 


Tyr 


Ala 


Lys 


Lys 


Leu 


Thr 


Leu 


Ala 


Ser Gly Gly 


Gly Gly 


290 










295 










300 


Gly He 


Ser 


Phe 


Ser 


Asn 


Asn 


He 


Val 


Gin 


Gly 


Thr Thr Ala 


Gly Asn 


305 








310 










315 




320 


Gly Gly 


Ala 


He 


Ser 


He 


Leu 


Ala 


Ala 


Gly 


Glu 


Cys Ser Leu 


Ser Ala 








325 










330 






335 


Glu Ala 


Gly 


Asp 


He 


Thr 


Phe 


Asn 


Gly 


Asn 


Ala 


He Val Ala 


Thr Thr 






340 










345 






350 
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Pro Gin Thr Thr Lys Arg Asn Ser He Asp He Gly Ser Thr Ala Lys 

355 360 365 

He Thr Asn Leu Arg Ala He Ser Gly His Ser lie Phe Phe Tyr Asp 

370 375 380 

Pro He Thr Ala Asn Thr Ala Ala Asp Ser Thr Asp Thr Leu Asn Leu 
385 390 395 400 

Asn Lys Ala Asp Ala Gly Asn Ser Thr Asp Tyr Ser Gly Ser He Val 

405 410 415 

Phe Ser Gly Glu Lys Leu Ser Glu Asp Glu Ala Lys Val Ala Asp Asn 

420 425 430 

Leu Thr Ser Thr Leu Lys Gin Pro Val Thr Leu Thr Ala Gly Asn Leu 

435 440 445 

Val Leu Lys Arg Gly Val Thr Leu Asp Thr Lys Gly Phe Thr Gin Thr 

450 455 460 

Ala Glv Ser Ser Vajl Hp M*»t- &Qn nia m v tv.v t-^v t^, t,^ 

-* — sr *- — -*■ ■"-*- *j*_u* x»jr a ru.a ucl 

465 470 475 480 

Thr Glu Glu Val Thr Leu Thr Gly Leu Ser He Pro Val Asp Ser Leu 

485 490 ^ 495 

Gly Glu Gly Lys Lys Val Val He Ala Ala Ser Ala Ala Ser Lys Asn 

500 505 510 

Val Ala Leu Ser Gly Pro He Leu Leu Leu Asp Asn Gin Gly Asn Ala 

515 520 525 

Tyr Glu Asn His Asp Leu Gly Lys Thr Gin Asp Phe Ser Phe Val Gin 

530 535 540 

Leu Ser Ala Leu Gly Thr Ala Thr Thr Thr Asp Val Pro Ala Val Pro 
545 550 555 560 

Thr Val Ala Thr Pro Thr His Tyr Gly Tyr Gin Gly Thr Trp Gly Met 

565 570 575 

Thr Trp Val Asp Asp Thr Ala Ser Thr Pro Lys Thr Lys Thr Ala Thr 

580 585 590 

Leu Ala Trp Thr Asn Thr Gly Tyr Leu Pro Asn Pro Glu Arg Gin Gly 

595 600 605 

Pro Leu Val Pro Asn Ser Leu Trp Gly Ser Phe Ser Asp lie Gin Ala 

610 615 620 

He Gin Gly Val He Glu Arg Ser Ala Leu Thr Leu Cys Ser Asp Arg 
625 630 635 640 

Gly Phe Trp Ala Ala Gly Val Ala Asn Phe Leu Asp Lys Asp Lys Lys 

645 650 655 

Gly Glu Lys Arg Lys Tyr Arg His Lys Ser Gly Gly Tyr Ala lie Gly 

660 665 670 

Gly Ala Ala Gin Thr Cys Ser Glu Asn Leu lie Ser Phe Ala Phe Cys 

675 680 685 

Gin Leu Phe Gly Ser Asp Lys Asp Phe Leu Val Ala Lys Asn His Thr 

690 695 700 

Asp Thr Tyr Ala Gly Ala Phe Tyr lie Gin His He Thr Glu Cys Ser 
705 710 715 720 

Gly Phe He Gly Cys Leu Leu Asp Lys Leu Pro Gly Ser Trp Ser His 

725 730 735 

Lys Pro Leu Val Leu Glu Gly Gin Leu Ala Tyr Ser His Val Ser Asn 

740 745 750 

Asp Leu Lys Thr Lys Tyr Thr Ala Tyr Pro Glu Val Lys Gly Ser Trp 

755 760 765 

Gly Asn Asn Ala Phe Asn Met Met Leu Gly Ala Ser Ser His Ser Tyr 

770 775 780 

Pro Glu Tyr Leu His Cys Phe Asp Thr Tyr Ala Pro Tyr lie Lys Leu 
785 790 795 800 

Asn Leu Thr Tyr lie Arg Gin Asp Ser Phe Ser Glu Lys Gly Thr Glu 
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805 

Gly Arg Ser Phe Asp Asp Ser Asn 
820 

Gly Val Lys Phe Glu Lys Phe Ser 
835 840 
Leu Thr Leu Ser Tyr Val Pro Asp 

850 855 
Thr Thr Ala Leu Val lie Ser Gly 
865 870 
Asn Leu Ala Arg Gin Ala Leu Gin 
885 

Phe Ser Pro Met Phe Glu Val Leu 
900 

Gly Ser 



810 815 
Leu Phe Asn Leu Ser Leu Pro lie 
825 830 
Asp Cys Asn Asp Phe Ser Tyr Asp 
845 

Leu lie Arg Asn Asp Pro Lys Cys 
860 

Ala Ser Trp Glu Thr Tyr Ala Asn 
875 880 
Val Arg Ala Gly Ser His Tyr Ala 

890 895 
Gly Gin Phe Val Phe Glu Val Arg 
905 910 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1200 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 1...1200 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 27: 

GAT CCT AAA AAT AAA GAG TAC ACA GGG ACC ATA CTC TTT TCT GGA GAA 48 
Asp Pro Lys Asn Lys Glu Tyr Thr Gly Thr He Leu Phe Ser Gly Glu 
1 5 10 15 

AAG AGT CTA GCA AAC GAT CCT AGG GAT TTT AAA TCT ACA ATC CCT CAG 96 
Lys Ser Leu Ala Asn Asp Pro Arg Asp Phe Lys Ser Thr He Pro Gin 
20 25 30 

AAC GTC AAC CTG TCT GCA GGA TAC TTA GTT ATT AAA GAG GGG GCC GAA 144 
Asn Val Asn Leu Ser Ala Gly Tyr Leu Val He Lys Glu Gly Ala Glu 
35 40 45 

GTC ACA GTT TCA AAA TTC ACG CAG TCT CCA GGA TCG CAT TTA GTT TTA 192 
Val Thr Val Ser Lys Phe Thr Gin Ser Pro Gly Ser His Leu Val Leu 
50 55 60 

GAT TTA GGA ACC AAA CTG ATA GCC TCT AAG GAA GAC ATT GCC ATC ACA 240 
Asp Leu Gly Thr Lys Leu He Ala Ser Lys Glu Asp He Ala He Thr 
65 70 75 80 

GGC CTC GCG ATA GAT ATA GAT AGC TTA AGC TCA TCC TCA ACA GCA GCT 288 
Gly Leu Ala He Asp He Asp Ser Leu Ser Ser Ser Ser Thr Ala Ala 
85 90 95 
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GTT ATT AAA GCA AAC ACC GCA AAT AAA CAG ATA TCC GTG ACG GAC TCT 336 
Val He Lys Ala Asn Thr Ala Asn Lys Gin He Ser Val Thr Asp Ser 
100 105 no 

ATA GAA CTT ATC TCG CCT ACT GGC AAT GCC TAT GAA GAT CTC AGA ATG 384 
He Glu Leu He Ser Pro Thr Gly Asn Ala Tyr Glu Asp Leu Arg Met 
115 120 125 

AGA AAT TCA CAG ACG TTC CCT CTG CTC TCT TTA GAG CCT GGA GCC GGG 432 
Arg Asn Ser Gin Thr Phe Pro Leu Leu Ser Leu Glu Pro Gly Ala Gly 
130 135 140 

GGT AGT GTG ACT GTA ACT GCT GGA GAT TTC CTA CCG GTA AGT CCC CAT 480 
Gly Ser Val Thr Val Thr Ala Gly Asp Phe Leu Pro Val Ser Pro His 
145 150 155 160 

TAT GGT TTT CAA GGC AAT TGG AAA TTA GCT TGG ACA GGA ACT GGA AAC 528 
Tyr Gly Phe Gin Gly Asn Trp Lys Leu Ala Trp Thr Gly Thr Gly Asn 
165 170 175 

AAA GTT GGA GAA TTC TTC TGG GAT AAA ATA AAT TAT AAG CCT AGA CCT 576 
Lys Val Gly Glu Phe Phe Trp Asp Lys He Asn Tyr Lys Pro Arg Pro 
180 185 190 

GAA AAA GAA GGA AAT TTA GTT CCT AAT ATC TTG TGG GGG AAT GCT GTA 624 
Glu Lys Glu Gly Asn Leu Val Pro Asn He Leu Trp Gly Asn Ala Val 
195 200 205 

AAT GTC AGA TCC TTA ATG CAG GTT CAA GAG ACC CAT GCA TCG AGC TTA 672 
Asn Val Arg Ser Leu Met Gin Val Gin Glu Thr His Ala Ser Ser Leu 
210 215 220 

CAG ACA GAT CGA GGG CTG TGG ATC GAT GGA ATT GGG AAT TTC TTC CAT 720 
Gin Thr Asp Arg Gly Leu Trp He Asp Gly He Gly Asn Phe Phe His 
225 230 235 240 

GTA TCT GCC TCC GAA GAC AAT ATA AGG TAC CGT CAT AAC AGC GGT GGA 768 
Val Ser Ala Ser Glu Asp Asn lie Arg Tyr Arg His Asn Ser Gly Gly 
245 250 255 

TAT GTT CTA TCT GTA AAT AAT GAG ATC ACA CCT AAG CAC TAT ACT TCG 816 
Tyr Val Leu Ser Val Asn Asn Glu He Thr Pro Lys His Tyr Thr Ser 
260 265 270 

ATG GCA TTT TCC CAA CTC TTT AGT AGA GAC AAA GAC TAT GCG GTT TCC 864 
Met Ala Phe Ser Gin Leu Phe Ser Arg Asp Lys Asp Tyr Ala Val Ser 
275 280 285 

AAC AAC GAA TAC AGA ATG TAT TTA GGA TCG TAT CTC TAT CAA TAT ACA 912 
Asn Asn Glu Tyr Arg Met Tyr Leu Gly Ser Tyr Leu Tyr Gin Tyr Thr 
290 295 300 

ACC TCC CTA GGG AAT ATT TTC CGT TAT GCT TCG CGT AAC CCT AAT GTA 960 
Thr Ser Leu Gly Asn He Phe Arg Tyr Ala Ser Arg Asn Pro Asn Val 
305 310 315 320 

AAC GTC GGG ATT CTC TCA AGA AGG TTT CTT CAA AAT CCT CTT ATG ATT 1008 
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Asn Val Gly lie Leu Ser Arg Arg Phe Leu Gin Asn Pro Leu Met lie 
325 330 335 

TTT CAT TTT TTG TGT GCT TAT GGT CAT GCC ACC AAT GAT ATG AAA ACA 1056 
Phe His Phe Leu Cys Ala Tyr Gly His Ala Thr Asn Asp Met Lys Thr 
340 345 35 0 

GAC TAC GCA AAT TTC CCT ATG GTG AAA AAC AGC TGG AGA AAC AAT TGT 1104 
Asp Tyr Ala Asn Phe Pro Met Val Lys Asn Ser Trp Arg Asn Asn Cys 
355 360 365 

TGG GCT ATA AAA TGC GGA GGG AGC ATG CCT CTA TTG GTA TTT GAA AAC 1152 
Trp Ala He Lys Cys Gly Gly Ser Met Pro Leu Leu Val Phe Glu Asn 
370 375 380 

GGA AAA CTT TTC CAA GGT GCC ATC CCA TTT ATG AAA CTA CAA TTA GTT 1200 
Gly Lys Leu Phe Gin Gly Ala He Pro Phe Met Lys Leu Gin Leu Val 
385 • 390 395 400 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



Asp Pro 


Lys 


Asn 


Lys 


Glu 


Tyr 


Thr 


Gly 


Thr 


He 


Leu 


Phe 


Ser Gly Glu 


1 






5 










10 








15 


Lys Ser 


Leu 


Ala 
20 


Asn 


Asp 


Pro 


Arg 


Asp 
25 


Phe 


Lys 


Ser 


Thr 


He Pro Gin 
30 


Asn Val 


Asn 


Leu 


Ser 


Ala 


Gly 


Tyr 


Leu 


Val 


He 


Lys 


Glu 


Gly Ala Glu 




35 










40 










45 


Val Thr 


Val 


Ser 


Lys 


Phe 


Thr 


Gin 


Ser 


Pro 


Gly 


Ser 


His 


Leu Val Leu 


50 










55 










60 






Asp Leu 


Gly 


Thr 


Lys 


Leu 


He 


Ala 


Ser 


Lys 


Glu 


Asp 


He 


Ala He Thr 


65 








70 










75 




80 


Gly Leu 


Ala 


He 


Asp 
85 


He 


Asp 


Ser 


Leu 


Ser 
90 


Ser 


Ser 


Ser 


Thr Ala Ala 
95 


Val He 


Lys 


Ala 
100 


Asn 


Thr 


Ala 


Asn 


Lys 
105 


Gin 


He 


Ser 


Val 


Thr Asp Ser 
110 


He Glu 


Leu 


He 


Ser 


Pro 


Thr 


Gly 


Asn 


Ala 


Tyr 


Glu Asp 


Leu Arg Met 




115 










120 










125 


Arg Asn 


Ser 


Gin 


Thr 


Phe 


Pro 


Leu 


Leu 


Ser 


Leu 


Glu 


Pro 


Gly Ala Gly 


130 










135 










140 




Gly Ser 


Val 


Thr 


Val 


Thr 


Ala 


Gly 


Asp 


Phe 


Leu 


Pro 


Val 


Ser Pro His 


145 








150 










155 






160 


Tyr Gly 


Phe 


Gin 


Gly 


Asn 


Trp 


Lys 


Leu 


Ala 


Trp 


Thr Gly 


Thr Gly Asn 








165 










170 








175 


Lys Val 


Gly 


Glu 


Phe 


Phe 


Trp 


Asp 


Lys 


He 


Asn 


Tyr Lys 


Pro Arg Pro 



180 185 190 
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Glu 


Lys 


Glu 


Gly 


Asn 


Leu 


Val 


Pro 


Asn 


He 


Leu 


Trp Glv Asn 


Ala 


Val 






195 










200 








205 






Asn 


Val 


Arg 


Ser 


Leu 


Met 


Gin 


Val 


Gin 


Glu 


Thr 




Ser 


Leu 




210 










215 










220 






Gin 


Thr 


Asp 


Arg 


Gly 


Leu 


Trp 


He 


Asp 


Glv 


He 


vj J- y noli it lie 


Phe 


His 


225 










230 










235 






240 


Val 


Ser 


Ala 


Ser 


Glu 


Asp 


Asn 


He 


Arg 


Tvr 

j. yj. 


Arg 


Hi G Acn Got* 
ni o /-loll Oct 


Gly Gly 










245 










250 






255 




Tyr 


Val 


Leu 


Ser 


Val 


Asn 


Asn 


Glu 


He 


Thr 


Pro 




Thr 


Ser 








260 










265 












Met 


Ala 


Phe 


Ser 


Gin 


Leu 


Phe 


Ser 


Arg 


Asp 


Lys 


nop i yr AXa 


Val 


Ser 






275 










280 














Asn 


Asn 


Glu 


Tyr 


Arcr 


Met 


Tvr 


Leu 


Glv 


Ser 


Ayr 


T.on Ttrr f2"lf» 


Tyr Thr 




290 










295 










U V 






Thr 


Ser 


Leu 


Ol v. 

— j 


Asn 


lie 


Phe 




TSrv 
- J * 


Ala 






Asn 


Val 


305 










310 










J ID 






320 


Asn 


Val 


Gly 


lie 


Leu 


Ser 




Arg 


Phe 


i-ieu 




Asn Pro Leu 


Met 


He 










325 










JjU 






335 




Phe 


His 


Phe 


Leu 


Cys 


Ala 


Tyr 


Gly 


His 


Ala 


Thr 


Aqti Ron Mot* 
fton nop l T ICL> 


Lys 


Thr 








340 










345 






350 




Asp 


Tyr 


Ala 


Asn 


Phe 


Pro 


Met 


Val 


Lys 


Asn 


Ser 


Trp Arg Asn 


Asn 


Cys 






355 










360 








365 




Trp 


Ala 


lie 


Lys 


Cys 


Gly 


Gly 


Ser 


Met 


Pro 


Leu 


Leu Val Phe 


Glu 


Asn 




370 










375 










380 






Gly 


Lys 


Leu 


Phe 


Gin 


Gly 


Ala 


He 


Pro 


Phe 


Met 


Lys Leu Gin 


Leu 


Val 


385 










390 










395 






400 



(2) INFORMATION FOR SEQ ID NO: 29: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1830 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 1...1830 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

GAT CTC ACA TTA GGG AGT CGT GAC AGT TAT AAT GGT GAT ACA AGC ACC 48 
Asp Leu Thr Leu Gly Ser Arg Asp Ser Tyr Asn Gly Asp Thr Ser Thr 
1 * 10 15 

ACA GAA TTT ACT CCT AAA GCG GCA ACT TCT GAT GCT AGT GGC ACG ACC 96 
Thr Glu Phe Thr Pro Lys Ala Ala Thr Ser Asp Ala Ser Gly Thr Thr 
20 25 30 

TAT ATT CTC GAT GGG GAT GTC TCG ATA AGC CAA GCA GGG AAA CAA ACG 144 
Tyr He Leu Asp Gly Asp Val Ser He Ser Gin Ala Gly Lys Gin Thr 
35 40 45 
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AGC TTA ACC ACA AGT TGT TCT TCT AAC ACT GCA GGA AAT CTT ACC TTC 192 
Ser Leu Thr Thr Ser Cys Phe Ser Asn Thr Ala Gly Asn Leu Thr Phe 
50 55 60 

TTA GGG AAC GGA TTT TCT CTT CAT TTT GAC AAT ATT ATT TCG TCT ACT 240 
Leu Gly Asn Gly Phe Ser Leu His Phe Asp Asn He He Ser Ser Thr 
65 70 75 80 

GTT GCA GGT GTT GTT GTT AGC AAT ACA GCA GCT TCT GGG ATT ACG AAA 288 
Val Ala Gly Val Val Val Ser Asn Thr Ala Ala Ser Gly He Thr Lys 
85 90 95 

TTC TCA GGA TTT TCA ACT CTT CGG ATG CTT GCA GCT CCT AGG ACC ACA 336 
Phe Ser Gly Phe Ser Thr Leu Arg Met Leu Ala Ala Pro Arg Thr Thr 
100 105 no 

GGT AAA GGA GCC ATT AAA ATT ACC GAT GGT CTG GTG TTT GAG AGT ATA 384 
Gly Lys Gly Ala He Lys lie Thr Asp Gly Leu Val Phe Glu Ser He 
115 120 125 

GGG AAT CTT GAT CCG ATT ACT GTA ACA GGA TCG ACA TCT GTT GCT GAT 432 
Gly Asn Leu Asp Pro He Thr Val Thr Gly Ser Thr Ser Val Ala Asp 
130 135 140 

GCT CTC AAT ATT AAT AGC CCT GAT ACT GGA GAT AAC AAA GAG TAT ACG 480 
Ala Leu Asn He Asn Ser Pro Asp Thr Gly Asp Asn Lys Glu Tyr Thr 
145 150 155 160 

GGA ACC ATA GTC TTT TCT GGA GAG AAG CTC ACG GAG GCA GAA GCT AAA 528 
Gly Thr He Val Phe Ser Gly Glu Lys Leu Thr Glu Ala Glu Ala Lys 
165 170 175 

GAT GAG AAG AAC CGC ACT TCT AAA TTA CTT CAA AAT GTT GCT TTT AAA 576 
Asp Glu Lys Asn Arg Thr Ser Lys Leu Leu Gin Asn Val Ala Phe Lys 
180 185 190 

AAT GGG ACT GTA GTT TTA AAA GGT GAT GTC GTT TTA AGT GCG AAC GGT 624 
Asn Gly Thr Val Val Leu Lys Gly Asp Val Val Leu Ser Ala Asn Gly 
195 200 205 

TTC TCT CAG GAT GCA AAC TCT AAG TTG ATT ATG GAT TTA GGG ACG TCG 672 
Phe Ser Gin Asp Ala Asn Ser Lys Leu lie Met Asp Leu Gly Thr Ser 
210 215 220 

TTG GTT GCA AAC ACC GAA AGT ATC GAG TTA ACG AAT TTG GAA ATT AAT 720 
Leu Val Ala Asn Thr Glu Ser He Glu Leu Thr Asn Leu Glu He Asn 
225 230 235 240 

ATA GAC TCT CTC AGG AAC GGG AAA AAG ATA AAA CTC AGT GCT GCC ACA 768 
He Asp Ser Leu Arg Asn Gly Lys Lys He Lys Leu Ser Ala Ala Thr 
245 250 255 

GCT CAG AAA GAT ATT CGT ATA GAT CGT CCT GTT GTA CTG GCA ATT AGC 816 
Ala Gin Lys Asp He Arg He Asp Arg Pro Val Val Leu Ala He Ser 
260 265 270 

GAT GAG AGT TTT TAT CAA AAT GGC TTT TTG AAT GAG GAC CAT TCC TAT 864 
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Asp Glu Ser Phe Tyr Gin Asn Gly Phe Leu Asn Glu Asp His Ser Tyr 
275 280 285 

GAT GGG ATT CTT GAG TTA GAT GCT GGG AAA GAC ATC GTG ATT TCT GCA 912 
Asp Gly He Leu Glu Leu Asp Ala Gly Lys Asp He Val He Ser Ala 
290 295 300 



GAT TCT CGC AGT ATA GAT GCT GTA CAA TCT CCG TAT GGC TAT CAG GGA 
Asp Ser Arg Ser He Asp Ala Val Gin Ser Pro Tyr Gly Tyr Gin Gly 
305 310 315 320 



960 



AAG TGG ACG ATC AAT TGG TCT ACT GAT GAT AAG AAA GCT ACG GTT TCT 1008 
Lys Trp Thr He Asn Trp Ser Thr Asp Asp Lys Lys Ala Thr Val Ser 
325 330 335 

TGG GCG AAG CAG AGT TTT AAT CCC ACT GCT GAG CAG GAG GCT CCG TTA 1056 
Trp Ala Lys Gin Ser Phe Asn Pro Thr Ala Glu Gin Glu Ala Pro Leu 
340 345 350 

GTT CCT AAT CTT CTT TGG GGT TCT TTT ATA GAT GTT CGT TCC TTC CAG 1104 
Val Pro Asn Leu Leu Trp Gly Ser Phe He Asp Val Arg Ser Phe Gin 
355 360 365 

AAT TTT ATA GAG CTA GGT ACT GAA GGT GCT CCT TAC GAA AAG AGA TTT 1152 
Asn Phe He Glu Leu Gly Thr Glu Gly Ala Pro Tyr Glu Lys Arg Phe 
370 375 380 

TGG GTT GCA GGC ATT TCC AAT GTT TTG CAT AGG AGC GGT CGT GAA AAT 1200 
Trp Val Ala Gly He Ser Asn Val Leu His Arg Ser Gly Arg Glu Asn 
385 390 395 400 

CAA AGG AAA TTC CGT CAT GTG AGT GGA GGT GCT GTA GTA GGT GCT AGC 1248 
Gin Arg Lys Phe Arg His Val Ser Gly Gly Ala Val Val Gly Ala Ser 
405 410 415 

ACG AGG ATG CCG GGT GGT GAT ACC TTG TCT CTG GGT TTT GCT CAG CTC 1296 
Thr Arg Met Pro Gly Gly Asp Thr Leu Ser Leu Gly Phe Ala Gin Leu 
420 425 430 

TTT GCG CGT GAC AAA GAC TAC TTT ATG AAT ACC AAT TTC GCA AAG ACC 1344 
Phe Ala Arg Asp Lys Asp Tyr Phe Met Asn Thr Asn Phe Ala Lys Thr 
435 440 445 

TAC GCA GGA TCT TTA CGT TTG CAG CAC GAT GCT TCC CTA TAC TCT GTG 1392 
Tyr Ala Gly Ser Leu Arg Leu Gin His Asp Ala Ser Leu Tyr Ser Val 
450 455 460 

GTG AGT ATC CTT TTA GGA GAG GGA GGA CTC CGC GAG ATC CTG TTG CCT 1440 
Val Ser He Leu Leu Gly Glu Gly Gly Leu Arg Glu He Leu Leu Pro 
465 470 475 480 

TAT GTT TCC AAT ACT CTG CCG TGC TCT TTC TAT GGG CAG CTT AGC TAC 1488 
Tyr Val Ser Asn Thr Leu Pro Cys Ser Phe Tyr Gly Gin Leu Ser Tyr 
485 490 495 

GGC CAT ACG GAT CAT CGC ATG AAG ACC GAG TCT CTA CCC CCC CCC CCC 1536 
Gly His Thr Asp His Arg Met Lys Thr Glu Ser Leu Pro Pro Pro Pro 
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500 505 510 

CCG ACG CTC TCG ACG GAT CAT ACT TCT TGG GGA GGA TAT GTC TGG GCT 1584 
Pro Thr Leu Ser Thr Asp His Thr Ser Trp Gly Gly Tyr Val Trp Ala 
515 520 525 

GGA GAG CTG GGA ACT CGA GTT GCT GTT GAA AAT ACC AGC GGC AGA GGA 1632 
Gly Glu Leu Gly Thr Arg Val Ala Val Glu Asn Thr Ser Gly Arg Gly 
530 535 540 

TTT TTC CGA GAG TAC ACT CCA TTT GTA AAA GTC CAA GCT GTT TAC TCG 1680 
Phe Phe Arg Glu Tyr Thr Pro Phe Val Lys Val Gin Ala Val Tyr Ser 
545 550 555 560 



rnr raa naT arcr* ttt nrw nnn rr* en* r*rrr> -*^»m nnm r»nm - e 

Arg Gin Asp Ser Phe Val Glu Leu Gly Ala lie Ser Arg Asp Phe Ser 
565 570 575 

GAT TCG CAT CTT TAT AAC CTT GCG ATT CCT CTT GGA ATC AAG TTA GAG 1776 
Asp Ser His Leu Tyr Asn Leu Ala He Pro Leu Gly He Lys Leu Glu 
580 585 590 

AAA CGG TTT GCA GAG CAA TAT TAT CAT GTT GTT GCG ATG TAT TCT CCA 1824 
Lys Arg Phe Ala Glu Gin Tyr Tyr His Val Val Ala Met Tyr Ser Pro 
595 600 605 



GAT GTT 
Asp Val 
610 



1830 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 610 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



Asp 


Leu 


Thr 


Leu 


Gly 


Ser 


Arg 


Asp 


Ser 


Tyr 


Asn 


Gly Asp Thr Ser Thr 


1 








5 










10 




15 


Thr 


Glu 


Phe 


Thr 


Pro 


Lys 


Ala 


Ala 


Thr 


Ser 


Asp 


Ala Ser Gly Thr Thr 








20 










25 






30 


Tyr 


He 


Leu 


Asp 


Gly 


Asp 


Val 


Ser 


He 


Ser 


Gin 


Ala Gly Lys Gin Thr 






35 










40 








45 


Ser 


Leu 


Thr 


Thr 


Ser 


Cys 


Phe 


Ser 


Asn 


Thr 


Ala 


Gly Asn Leu Thr Phe 




50 










55 










60 


Leu Gly 


Asn 


Gly 


Phe 


Ser 


Leu 


His 


Phe 


Asp 


Asn 


He He Ser Ser Thr 


65 










70 










75 


80 


Val 


Ala 


Gly 


Val 


Val 


Val 


Ser 


Asn 


Thr 


Ala 


Ala 


Ser Gly He Thr Lys 










85 










90 




95 


Phe 


Ser 


Gly 


Phe 


Ser 


Thr 


Leu 


Arg 


Met 


Leu 


Ala 


Ala Pro Arg Thr Thr 
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100 105 110 

Gly Lys Gly Ala He Lys He Thr Asp Gly Leu Val Phe Glu Ser He 

115 120 125 

Gly Asn Leu Asp Pro He Thr Val Thr Gly Ser Thr Ser Val Ala Asp 

130 135 140 

Ala Leu Asn He Asn Ser Pro Asp Thr Gly Asp Asn Lys Glu Tyr Thr 
145 150 155 160 

Gly Thr He Val Phe Ser Gly Glu Lys Leu Thr Glu Ala Glu Ala Lys 

165 170 175 

Asp Glu Lys Asn Arg Thr Ser Lys Leu Leu Gin Asn Val Ala Phe Lys 

180 185 190 

Asn Gly Thr Val Val Leu Lys Gly Asp Val Val Leu Ser Ala Asn Gly 

195 200 205 

Phe Ser Gin Asp Ala Asn Ser Lys Leu He Met Asp Leu Gly Thr Ser 

210 2-15 220 

Leu Val Ala Asn Thr Glu Ser He Glu Leu Thr Asn Leu Glu He Asn 
225 230 235 240 

He Asp Ser Leu Arg Asn Gly Lys Lys lie Lys Leu Ser Ala Ala Thr 

245 250 255 

Ala Gin Lys Asp He Arg He Asp Arg Pro Val Val Leu Ala He Ser 

260 265 270 

Asp Glu Ser Phe Tyr Gin Asn Gly Phe Leu Asn Glu Asp His Ser Tyr 

275 280 285 

Asp Gly lie Leu Glu Leu Asp Ala Gly Lys Asp He Val He Ser Ala 

290 295 300 

Asp Ser Arg Ser lie Asp Ala Val Gin Ser Pro Tyr Gly Tyr Gin Gly 
305 310 315 320 

Lys Trp Thr lie Asn Trp Ser Thr Asp Asp Lys Lys Ala Thr Val Ser 

325 330 335 

Trp Ala Lys Gin Ser Phe Asn Pro Thr Ala Glu Gin Glu Ala Pro Leu 

340 345 350 

Val Pro Asn Leu Leu Trp Gly Ser Phe He Asp Val Arg Ser Phe Gin 

355 360 365 

Asn Phe lie Glu Leu Gly Thr Glu Gly Ala Pro Tyr Glu Lys Arg Phe 

370 375 380 

Trp Val Ala Gly lie Ser Asn Val Leu His Arg Ser Gly Arg Glu Asn 
385 390 395 400 

Gin Arg Lys Phe Arg His Val Ser Gly Gly Ala Val Val Gly Ala Ser 

405 410 415 

Thr Arg Met Pro Gly Gly Asp Thr Leu Ser Leu Gly Phe Ala Gin Leu 

420 425 430 

Phe Ala Arg Asp Lys Asp Tyr Phe Met Asn Thr Asn Phe Ala Lys Thr 

435 440 445 

Tyr Ala Gly Ser Leu Arg Leu Gin His Asp Ala Ser Leu Tyr Ser Val 

450 455 460 

Val Ser lie Leu Leu Gly Glu Gly Gly Leu Arg Glu He Leu Leu Pro 
465 470 475 480 

Tyr Val Ser Asn Thr Leu Pro Cys Ser Phe Tyr Gly Gin Leu Ser Tyr 

485 490 495 

Gly His Thr Asp His Arg Met Lys Thr Glu Ser Leu Pro Pro Pro Pro 

500 505 510 

Pro Thr Leu Ser Thr Asp His Thr Ser Trp Gly Gly Tyr Val Trp Ala 

515 520 525 

Gly Glu Leu Gly Thr Arg Val Ala Val Glu Asn Thr Ser Gly Arg Gly 

530 535 540 

Phe Phe Arg Glu Tyr Thr Pro Phe Val Lys Val Gin Ala Val Tyr Ser 
545 550 555 560 
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Arg Gin Asp Ser Phe Val Glu Leu 
565 

Asp Ser His Leu Tyr Asn Leu Ala 
580 

Lys Arg Phe Ala Glu Gin Tyr Tyr 
595 600 

Asp Val 
610 



88 

Gly Ala He Ser Arg Asp Phe Ser 

570 575 
He Pro Leu Gly He Lys Leu Glu 
585 590 
His Val Val Ala Met Tyr Ser Pro 
605 
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Claims 

I. Species specific diagnostic test for identifying 
infection of a mammal, such as a human, with Chlamydia 
pneumoniae, said test comprising detecting in a patient or in 

5 a patient sample the presence of antibodies against one or 
more proteins from the outer membrane of Clamydia pneumoniae, 
said proteins being of a molecular weight of 100.3-89.6 kDa 
or of 56.1 kDa, or detecting the presence of nucleic acid 
fragments encp.ding said outer membrane proteins, 

10 2. Diagnostic test according to claim 1, wherein the outer 
membrane protein has the sequence as shown in SEQ ID NO: 2, 
SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ 
ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ 
ID NO: 20, SEQ ID NO: 22, or in SEQ ID NO: 24, or a variant 

15 or subsequence thereof. 

3. Diagnostic test according to claim 1, wherein the nucleic 
acid fragment has the sequence shown in SEQ ID NO: 1, SEQ ID 
NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 

II, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 
20 19, SEQ ID NO: 21, or in SEQ ID NO: 23, or a variant or 

subsequence thereof. 

4. Diagnostic test according to claim 3 wherein detection of 
nucleic acid fragments is obtained by using nucleic acid 
amplification . 

25 5. Diagnostic test according to claim 4, wherein detection 
of nucleic acid fragments is obtained by using polymerase 
chain reaction. 

6. A nucleic acid fragment derived from Chlamydia pneumoniae 
comprising the nucleotide sequence SEQ ID NO: 1, SEQ ID NO: 
30 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, 
SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, 
SEQ ID NO: 21, or SEQ ID NO: 23, or a variant or subsequence 
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of said nucleotide sequence which has a sequence homology of 
at least 50% with any of the sequences mentioned. 

7. A protein derived from Chlamydia pneumoniae having the 
amino acid sequence shown in SEQ ID NO: 2, SEQ ID NO: 4, SEQ 
5 ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID 
NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID 
NO: 22, or SEQ ID NO: 24, or a variant or subsequence thereof 
having a sequence similarity of at least 50% and a similar 
biological f miction. 

0 8. Polyclonal monospecific antibody against the protein 

with the sequence shown in SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID 
NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 
14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 
22, or SEQ ID NO: 24, or a variant or subsequence thereof. 



15 9. A diagnostic kit for the diagnosis of infection of a 

mammal, such as a human, with Chlamydia pneumoniae, said kit 
comprising a protein with the amino acid sequence SEQ ID NO: 
2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, 
SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, 

20 SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24, or a variant 
or subsequence thereof. 



10. A diagnostic kit for the diagnosis of infection of a 
mammal, such as a human, with Chlamydia pneumoniae, said kit 
comprising antibodies against a protein with the amino acid 

25 sequence SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 

8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 
16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID 
NO: 24, or a variant or subsequence thereof. 

11. A diagnostic kit for the diagnosis of infection of a 
30 mammal, such as a human, with Chlamydia pneumoniae, said kit 

comprising a nucleic acid fragment with the sequence SEQ ID 
NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 

9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 
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17, SEQ ID NO: 19, SEQ ID NO: 21, or SEQ ID NO: 23, or a 
variant or subsequence thereof. 

12. A composition for immunizing a mammal, such as a 

human, against Chlamydia pneumoniae, said composition 
5 comprising a protein with the amino acid sequence shown in 
SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ 
ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ 
ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24, or 
a variant or subsequence thereof. 

10 13 . Use 9f a protein with the sequence shown in SEQ ID 

NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 
10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 

18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24, or a 
variant or subsequence thereof in diagnosis of infection of a 

15 mammal, such as a human, with Chlamydia pneumoniae. 

14 . Use of the protein with the sequence shown in SEQ ID 
NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 
10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 
18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24 or a 

20 variant or subsequence thereof in an undenatured form, in 
diagnosis of infection of a mammal, such as a human, with 
Chlamydia pneumoniae. 

15. Use of a protein with the sequence shown in SEQ ID 
NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 

25 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 
18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24, or a 
variant or subsequence thereof, for immunizing a mammal, such 
as a human, against Chlamydia pneumoniae. 

16. Use of the protein with the sequence shown in SEQ ID 
30 NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 

10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 
18, SEQ ID NO: 20, SEQ ID NO: 22, or SEQ ID NO: 24, or a 
variant or subsequence thereof in an undenatured form, for 
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immunizing a mammal, such as a human, against Chlamydia 
pneumoniae. 

17. Use of a nucleic acid fragment with the nucleotide 

sequence shown in SEQ ID NO: 1 SEQ ID NO: 3, SEQ ID NO: 5, 
5 SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ 
ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, or 
SEQ ID NO: 23, or a variant or subsequence of said nucleotide 
sequence which has a sequence homology of at least 50% with 
any of the sequences mentioned for immuni zing a mammal f such 
10 as a human, against Chlainydia pneumoniae. 
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Immunoblotting of C. pneumoniae EB, lane 1-3 heated to 100°C in SDS-sample buffer, 
lane 4-6 unheated. Lane 1 reacted with rabbit anti C pneumoniae OMC; lane 2 and 4 
pre-serum; lane 3 and 5 polyclonal rabbit anti pEXl -1 fusion protein; lane 6 MAb 26. 1 . 
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Fig. 12 



